How does C know what type to expect? - c

If all values are nothing more than one or more bytes, and no byte can contain metadata, how does the system keep track of what sort of number a byte represents? Looking into Two's Complement and Single Point on Wikipedia reveals how these numbers can be represented in base-two, but I'm still left wondering how the compiler or processor (not sure which I'm really dealing with here) determines that this byte must be a signed integer.
It is analogous to receiving an encrypted letter and, looking at my shelf of cyphers, wondering which one to grab. Some indicator is necessary.
If I think about what I might do to solve this problem, two solutions come to mind. Either I would claim an additional byte and use it to store a description, or I would allocate sections of memory specifically for numerical representations; a section for signed numbers, a section for floats, etc.
I'm dealing primarily with C on a Unix system but this may be a more general question.

how does the system keep track of what sort of number a byte represents?
"The system" doesn't. During translation, the compiler knows the types of the objects it's dealing with, and generates the appropriate machine instructions for dealing with those values.

Ooh, good question. Let's start with the CPU - assuming an Intel x86 chip.
It turns out the CPU does not know whether a byte is "signed" or "unsigned." So when you add two numbers - or do any operation - a "status register" flag is set.
Take a look at the "sign flag." When you add two numbers, the CPU does just that - adds the numbers and stores the result in a register. But the CPU says "if instead we interpreted these numbers as twos complement signed integers, is the result negative?" If so, then that "sign flag" is set to 1.
So if your program cares about signed vs unsigned, writing in assembly, you would check the status of that flag and the rest of your program would perform a different task based on that flag.
So when you use signed int versus unsigned int in C, you are basically telling the compiler how (or whether) to use that sign flag.

The code that is executed has no information about the types. The only tool that knows the types is the compiler
at the time it compiles the code. Types in C are solely a restriction at compile time to prevent you
from using the wrong type somewhere. While compiling, the C compiler keeps track of the type
of each variable and therefore knows which type belongs to which variable.
This is the reason why you need to use format strings in printf, for example. printf has no chance of knowing what type it will get in the parameter list as this information is lost. In languages like go or java you have a runtime with reflection capabilities which makes it possible to get the type.
Suppose your compiled C code would still have type information in it, there would be the need for
the resulting assembler language to check for types. It turns out that the only thing close to types in assembly is size
of the operands for an instruction determined by suffixes (in GAS). So what is left from your type information is the size and nothing more.
One example for assembly which supports type is the java VM bytecode, which has type suffixes
for operands for primitives.

It is important to remember that C and C++ are high level languages. The compiler's job is to take the plain text representation of the code and build it into the platform specific instructions the target platform is expecting to execute. For most people using PCs this tends to be x86 assembly.
This is why C and C++ are so loose with how they define the basic data types. For example most people say there are 8 bits in a byte. This is not defined by the standard and there is nothing against some machine out there having 7 bits per byte as its native interpretation of data. The standard only recognizes that a byte is the smallest addressable unit of data.
So the interpretation of data is up to the instruction set of the processor. In many modern languages there is another abstraction on top of this, the Virtual Machine.
If you write your own scripting language it is up to you to define how you interpret your data in software.

Using C besides the compiler, that perfectly well knows about the type of the given values there is no system that knows about the type of a given value.
Note that C by itself doesn't bring any runtime type information system with it.
Take a look at the following example:
int i_var;
double d_var;
int main () {
i_var = -23;
d_var = 0.1;
return 0;
}
In the code there are two different types of values involved one to be stored as an integer and one to be stored as a double value.
The compiler that analyzes the code pretty well knows about the exact types of both of them. Here the dump of a short fragment of the type information gcc held while generation code generated by passing the -fdump-tree-all to gcc:
#1 type_decl name: #2 type: #3 srcp: <built-in>:0
chan: #4
#2 identifier_node strg: int lngt: 3
#3 integer_type name: #1 size: #5 algn: 32
prec: 32 sign: signed min : #6
max : #7
...
#5 integer_cst type: #11 low : 32
#6 integer_cst type: #3 high: -1 low : -2147483648
#7 integer_cst type: #3 low : 2147483647
...
#3805 var_decl name: #3810 type: #3 srcp: main.c:3
chan: #3811 size: #5 algn: 32
used: 1
...
#3810 identifier_node strg: i_var lngt: 5
Hunting down the #links you should clearly see that there really is a lot of information stored about memory-size, alignment-constraints and allowed min- and max-values for the type "int" stored in the nodes #1-3 and #5-7. (I left out the #4 node as the mentioned "chan" entry is just used to cha i n up any type definitions in the generated tree)
Reagarding the variable declared at main.c line 3 it is known, that it is holding a value of type int as seen by the type reference to node #3.
You'll sure be able to hunt down the double entries and the ones for d_var in an own experiment yourself if you don't trust me they will also there.
Taking a look at the generated assembler code (using gcc pass the -S switch) listed we can take a look at the way the compiler used this information in code generation:
.file "main.c"
.comm i_var,4,4
.comm d_var,8,8
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
movl $-23, i_var
fldl .LC0
fstpl d_var
movl $0, %eax
popl %ebp
ret
.size main, .-main
.section .rodata
.align 8
.LC0:
.long -1717986918
.long 1069128089
.ident "GCC: (Debian 4.4.5-8) 4.4.5"
.section .note.GNU-stack,"",#progbits
Taking a look at the assignment instructions you will see that the compiler figured out the right instructions "mov" to assign our int value and "fstp" to assign our "double" value.
Nevertheless besides the instructions chosen at the machine level there is no indication of the type of those values. Taking a look at the value stored at .LC0 the type "double" of the value 0.1 was even broken down in two consecutive storage locations each for a long to meet the known "types" of the assembler.
As a matter of fact breaking the value up this way was just one choice of other possiblities, using 8 consecutive values of "type" .byte would have done equally well.

Related

Why does Knuth use this clunky decrement?

I'm looking at some of Prof. Don Knuth's code, written in CWEB that is converted to C. A specific example is dlx1.w, available from Knuth's website
At one stage, the .len value of a struct nd[cc] is decremented, and it is done in a clunky way:
o,t=nd[cc].len-1;
o,nd[cc].len=t;
(This is a Knuth-specific question, so maybe you already know that "o," is a preprocessor macro for incrementing "mems", which is a running total of effort expended, as measured by accesses to 64-bit words.) The value remaining in "t" is definitely not used for anything else. (The example here is on line 665 of dlx1.w, or line 193 of dlx1.c after ctangle.)
My question is: why does Knuth write it this way, rather than
nd[cc].len--;
which he does actually use elsewhere (line 551 of dlx1.w):
oo,nd[k].len--,nd[k].aux=i-1;
(And "oo" is a similar macro for incrementing "mems" twice -- but there is some subtlety here, because .len and .aux are stored in the same 64-bit word. To assign values to S.len and S.aux, only one increment to mems would normally be counted.)
My only theory is that a decrement consists of two memory accesses: first to look up, then to assign. (Is that correct?) And this way of writing it is a reminder of the two steps. This would be unusually verbose of Knuth, but maybe it is an instinctive aide-memoire rather than didacticism.
For what it's worth, I've searched in CWEB documentation without finding an answer. My question probably relates more to Knuth's standard practices, which I am picking up bit by bit. I'd be interested in any resources where these practices are laid out (and maybe critiqued) as a block -- but for now, let's focus on why Knuth writes it this way.
A preliminary remark: with Knuth-style literate programming (i.e. when reading WEB or CWEB programs) the “real” program, as conceived by Knuth, is neither the “source” .w file nor the generated (tangled) .c file, but the typeset (woven) output. The source .w file is best thought of as a means to produce it (and of course also the .c source that's fed to the compiler). (If you don't have cweave and TeX handy; I've typeset some of these programs here; this program DLX1 is here.)
So in this case, I'd describe the location in the code as module 25 of DLX1, or subroutine "cover":
Anyway, to return to the actual question: note that this (DLX1) is one of the programs written for The Art of Computer Programming. Because reporting the time taken by a program “seconds” or “minutes” becomes meaningless from year to year, he reports how long a program took in number of “mems” plus “oops”, that's dominated by the “mems”, i.e. the number of memory accesses to 64-bit words (usually). So the book contains statements like “this program finds the answer to this problem in 3.5 gigamems of running time”. Further, the statements are intended to be fundamentally about the program/algorithm itself, not the specific code generated by a specific version of a compiler for certain hardware. (Ideally when the details are very important he writes the program in MMIX or MMIXAL and analyses its operations on the MMIX hardware, but this is rare.) Counting the mems (to be reported as above) is the purpose of inserting o and oo instructions into the program. Note that it's more important to get this right for the “inner loop” instructions that are executed a lot of times, such as everything in the subroutine cover in this case.
This is elaborated in Section 1.3.1′ (part of Fascicle 1):
Timing. […] The running time of a program depends not only on the clock rate but also on the number of functional units that can be active simultaneously and the degree to which they are pipelined; it depends on the techniques used to prefetch instructions before they are executed; it depends on the size of the random-access memory that is used to give the illusion of 264 virtual bytes; and it depends on the sizes and allocation strategies of caches and other buffers, etc., etc.
For practical purposes, the running time of an MMIX program can often be estimated satisfactorily by assigning a fixed cost to each operation, based on the approximate running time that would be obtained on a high-performance machine with lots of main memory; so that’s what we will do. Each operation will be assumed to take an integer number of υ, where υ (pronounced “oops”) is a unit that represents the clock cycle time in a pipelined implementation. Although the value of υ decreases as technology improves, we always keep up with the latest advances because we measure time in units of υ, not in nanoseconds. The running time in our estimates will also be assumed to depend on the number of memory references or mems that a program uses; this is the number of load and store instructions. For example, we will assume that each LDO (load octa) instruction costs µ + υ, where µ is the average cost of a memory reference. The total running time of a program might be reported as, say, 35µ+ 1000υ, meaning “35 mems plus 1000 oops.” The ratio µ/υ has been increasing steadily for many years; nobody knows for sure whether this trend will continue, but experience has shown that µ and υ deserve to be considered independently.
And he does of course understand the difference from reality:
Even though we will often use the assumptions of Table 1 for seat-of-the-pants estimates of running time, we must remember that the actual running time might be quite sensitive to the ordering of instructions. For example, integer division might cost only one cycle if we can find 60 other things to do between the time we issue the command and the time we need the result. Several LDB (load byte) instructions might need to reference memory only once, if they refer to the same octabyte. Yet the result of a load command is usually not ready for use in the immediately following instruction. Experience has shown that some algorithms work well with cache memory, and others do not; therefore µ is not really constant. Even the location of instructions in memory can have a significant effect on performance, because some instructions can be fetched together with others. […] Only the meta-simulator can be trusted to give reliable information about a program’s actual behavior in practice; but such results can be difficult to interpret, because infinitely many configurations are possible. That’s why we often resort to the much simpler estimates of Table 1.
Finally, we can use Godbolt's Compiler Explorer to look at the code generated by a typical compiler for this code. (Ideally we'd look at MMIX instructions but as we can't do that, let's settle for the default there, which seems to be x68-64 gcc 8.2.) I removed all the os and oos.
For the version of the code with:
/*o*/ t = nd[cc].len - 1;
/*o*/ nd[cc].len = t;
the generated code for the first line is:
movsx rax, r13d
sal rax, 4
add rax, OFFSET FLAT:nd+8
mov eax, DWORD PTR [rax]
lea r14d, [rax-1]
and for the second line is:
movsx rax, r13d
sal rax, 4
add rax, OFFSET FLAT:nd+8
mov DWORD PTR [rax], r14d
For the version of the code with:
/*o ?*/ nd[cc].len --;
the generated code is:
movsx rax, r13d
sal rax, 4
add rax, OFFSET FLAT:nd+8
mov eax, DWORD PTR [rax]
lea edx, [rax-1]
movsx rax, r13d
sal rax, 4
add rax, OFFSET FLAT:nd+8
mov DWORD PTR [rax], edx
which as you can see (even without knowing much about x86-64 assembly) is simply the concatenation of the code generated in the former case (except for using register edx instead of r14d), so it's not as if writing the decrement in one line has saved you any mems. In particular, it would not be correct to count it as a single one, especially in something like cover that is called a huge number of times in this algorithm (dancing links for exact cover).
So the version as written by Knuth is correct, for its goal of counting the number of mems. He could also write oo,nd[cc].len--; (counting two mems) as you observed, but perhaps it might look like a bug at first glance in that case. (BTW, in your example in the question of oo,nd[k].len--,nd[k].aux=i-1; the two mems come from the load and the store in --; not two stores.)
This whole practice seems to be based on a mistaken idea/model of how C works, that there's some correspondence between the work performed by the abstract machine and by the actual program as executed (i.e. the "C is portable assembler" fallacy). I don't think we can answer a lot more about why that exact code fragment appears, except that it happens to be an unusual idiom for counting loads and stores on the abstract machine as separate.

How can Microsoft say the size of a word in WinAPI is 16 bits?

I've just started learning the WinAPI. In the MSDN, the following explanation is provided for the WORD data type.
WORD
A 16-bit unsigned integer. The range is 0 through 65535 decimal.
This type is declared in WinDef.h as follows:
typedef unsigned short WORD;
Simple enough, and it matches with the other resources I've been using for learning, but how can it can definitively said that it is 16 bits? The C data types page on Wikipedia specifies
short / short int / signed short / signed short int
Short signed integer
type. Capable of containing at least the [−32767, +32767] range; thus,
it is at least 16 bits in size.
So the size of a short could very well be 32 bits according to the C Standard. But who decides what bit sizes are going to be used anyway? I found a practical explanation here. Specifically, the line:
...it depends on both processors (more specifically, ISA, instruction
set architecture, e.g., x86 and x86-64) and compilers including
programming model.
So it's the ISA then, which makes sense I suppose. This is where I get lost. Taking a look at the Windows page on Wikipedia, I see this in the side bar:
Platforms
ARM, IA-32, Itanium, x86-64, DEC Alpha, MIPS, PowerPC
I don't really know what these are but I think these are processors, each which would have an ISA. Maybe Windows supports these platforms because all of them are guaranteed to use 16 bits for an unsigned short? This doesn't sound quite right, but I don't really know enough about this stuff to research any further.
Back to my question: How is it that the Windows API can typedef unsigned short WORD; and then say WORD is a 16-bit unsigned integer when the C Standard itself does not guarantee that a short is always 16 bits?
Simply put, a WORD is always 16 bits.
As a WORD is always 16 bits, but an unsigned short is not, a WORD is not always an unsigned short.
For every platform that the Windows SDK supports, the windows header file contains #ifdef style macros that can detect the compiler and its platform, and associate the Windows SDK defined types (WORD, DWORD, etc) to the appropriately sized platform types.
This is WHY the Windows SDK actually uses internally defined types, such as WORD, rather than using language types: so that they can ensure that their definitions are always correct.
The Windows SDK that ships with Microsoft toolchains, is possibly lazy, as Microsoft c++ toolchains always use 16bit unsigned shorts.
I would not expect the windows.h that ships with Visual Studio C++ to work correctly if dropped into GCC, clang etc. as so many details, including the mechanism of importing dll's using .iib files that the Platform SDK distribute, is a Microsoft specific implementation.
A different interpretation is that:
Microsoft says a WORD is 16 bits. If "someone" wants to call a windows API, they must pass a 16 bit value where the API defines the field as a WORD.
Microsoft also possibly says, in order to build a valid windows program, using the windows header files present in their Windows SDK, the user MUST choose a compiler that has a 16bit short.
The c++ spec does not say that compilers must implement shorts as 16 bits - Microsoft says the compiler you choose to build windows executables must.
There was originally an assumption that all code intended to run on Windows would be compiled with Microsoft's own compiler - or a fully compatible compiler. And that's the way it worked. Borland C: Matched Microsoft C. Zortech's C: Matched Microsoft C. gcc: not so much, so you didn't even try (not to mention there were no runtimes, etc.).
Over time this concept got codified and extended to other operating systems (or perhaps the other operating systems got it first) and now it is known as an ABI - Application Binary Interface - for a platform, and all compilers for that platform are assumed (in practice, required) to match the ABI. And that means matching expectations for the sizes of integral types (among other things).
An interesting related question you didn't ask is: So why is 16-bits called a word? Why is 32-bits a dword (double word) on our 32- and now 64-bit architectures where the native machine "word" size is 32- or 64-, not 16? Because: 80286.
In the windows headers there is a lot of #define that based on the platform can ensure a WORD is 16 bit a DWORD is 32 etc etc. In some case in the past, I know they distribuite a proper SDK for each platform. In any case nothing magic, just a mixture of proper #defines and headers.
The BYTE=8bits, WORD=16bits, and DWORD=32bits (double-word) terminology comes from Intel's instruction mnemonics and documentation for 8086. It's just terminology, and at this point doesn't imply anything about the size of the "machine word" on the actual machine running the code.
My guess:
Those C type names were probably originally introduced for the same reason that C99 standardized uint8_t, uint16_t, and uint32_t. The idea was probably to allow C implementations with an incompatible ABI (e.g. 16 vs. 32-bit int) to still compile code that uses the WinAPI, because the ABI uses DWORD rather than long or int in structs, and function args / return values.
Probably as Windows evolved, enough code started depending in various ways on the exact definition of WORD and DWORD that MS decided to standardize the exact typedefs. This diverges from the C99 uint16_t idea, where you can't assume that it's unsigned short.
As #supercat points out, this can matter for aliasing rules. e.g. if you modify an array of unsigned long[] through a DWORD*, it's guaranteed that it will work as expected. But if you modify an array of unsigned int[] through a DWORD*, the compiler might assume that didn't affect array values that it already had in registers. This also matters for printf format strings. (C99's <stdint.h> solution to that is preprocessor macros like PRIu32.)
Or maybe the idea was just to use names that match the asm, to make sure nobody was confused about the width of types. In the very early days of Windows, writing programs in asm directly, instead of C, was popular. WORD/DWORD makes the documentation clearer for people writing in asm.
Or maybe the idea was just to provide a fixed-width types for portable code. e.g. #ifdef SUNOS: define it to an appropriate type for that platform. This is all it's good for at this point, as you noticed:
How is it that the Windows API can typedef unsigned short WORD; and then say WORD is a 16-bit unsigned integer when the C Standard itself does not guarantee that a short is always 16 bits?
You're correct, documenting the exact typedefs means that it's impossible to correctly implement the WinAPI headers in a system using a different ABI (e.g. one where long is 64bit or short is 32bit). This is part of the reason why the x86-64 Windows ABI makes long a 32bit type. The x86-64 System V ABI (Linux, OS X, etc.) makes long a 64bit type.
Every platform does need a standard ABI, though. struct layout, and even interpretation of function args, requires all code to agree on the size of the types used. Code from different version of the same C compiler can interoperate, and even other compilers that follow the same ABI. (However, C++ ABIs aren't stable enough to standardize. For example, g++ has never standardized an ABI, and new versions do break ABI compatibility.)
Remember that the C standard only tells you what you can assume across every conforming C implementation. The C standard also says that signed integers might be sign/magnitude, one's complement, or two's complement. Any specific platform will use whatever representation the hardware does, though.
Platforms are free to standardize anything that the base C standard leaves undefined or implementation-defined. e.g. x86 C implementations allow creating unaligned pointers to exist, and even to dereference them. This happens a lot with __m128i vector types.
The actual names chosen tie the WinAPI to its x86 heritage, and are unfortunately confusing to anyone not familiar with x86 asm, or at least Windows's 16bit DOS heritage.
The 8086 instruction mnemonics that include w for word and d for dword were commonly used as setup for idiv signed division.
cbw: sign extend AL (byte) into AX (word)
cwd: sign extend AX (word) into DX:AX (dword), i.e. copy the sign bit of ax into every bit of dx.
These insns still exist and do exactly the same thing in 32bit and 64bit mode. (386 and x86-64 added extended versions, as you can see in those extracts from Intel's insn set reference.) There's also lodsw, rep movsw, etc. string instructions.
Besides those mnemonics, operand-size needs to be explicitly specified in some cases, e.g.
mov dword ptr [mem], -1, where neither operand is a register that can imply the operand-size. (To see what assembly language looks like, just disassemble something. e.g. on a Linux system, objdump -Mintel -d /bin/ls | less.)
So the terminology is all over the place in x86 asm, which is something you need to be familiar with when developing an ABI.
More x86 asm background, history, and current naming schemes
Nothing below this point has anything to do with WinAPI or the original question, but I thought it was interesting.
See also the x86 tag wiki for links to Intel's official PDFs (and lots of other good stuff). This terminology is still ubiquitous in Intel and AMD documentation and instruction mnemonics, because it's completely unambiguous in a document for a specific architecture that uses it consistently.
386 extended register sizes to 32bits, and introduced the cdq instruction: cdq (eax (dword) -> edx:eax (qword)). (Also introduced movsx and movzx, to sign- or zero-extend without without needing to get the data into eax first.) Anyway, quad-word is 64bits, and was used even in pre-386 for double-precision memory operands for fld qword ptr [mem] / fst qword ptr [mem].
Intel still uses this b/w/d/q/dq convention for vector instruction naming, so it's not at all something they're trying to phase out.
e.g. the pshufd insn mnemonic (_mm_shuffle_epi32 C intrinsic) is Packed (integer) Shuffle Dword. psraw is Packed Shift Right Arithmetic Word. (FP vector insns use a ps (packed single) or pd (packed double) suffix instead of p prefix.)
As vectors get wider and wider, the naming starts to get silly: e.g. _mm_unpacklo_epi64 is the intrinsic for the punpcklqdq instruction: Packed-integer Unpack L Quad-words to Double-Quad. (i.e. interleave 64bit low halves into one 128b). Or movdqu for Move Double-Quad Unaligned loads/stores (16 bytes). Some assemblers use o (oct-word) for declaring 16 byte integer constants, but Intel mnemonics and documentation always use dq.
Fortunately for our sanity, the AVX 256b (32B) instructions still use the SSE mnemonics, so vmovdqu ymm0, [rsi] is a 32B load, but there's no quad-quad terminology. Disassemblers that include operand-sizes even when it's not ambiguous would print vmovdqu ymm0, ymmword ptr [rsi].
Even the names of some AVX-512 extensions use the b/w/d/q terminology. AVX-512F (foundation) doesn't include all element-size versions of every instruction. The 8bit and 16bit element size versions of some instructions are only available on hardware that supports the AVX-512BW extension. There's also AVX-512DQ for extra dword and qword element-size instructions, including conversion between float/double and 64bit integers and a multiply with 64b x 64b => 64b element size.
A few new instructions use numeric sizes in the mnemonic
AVX's vinsertf128 and similar for extracting the high 128bit lane of an 256bit vector could have used dq, but instead uses 128.
AVX-512 introduces a few insn mnemonics with names like vmovdqa64 (vector load with masking at 64bit element granularity) or vshuff32x4 (shuffle 128b elements, with masking at 32bit element granularity).
Note that since AVX-512 has merge-masking or zero-masking for almost all instructions, even instructions that didn't used to care about element size (like pxor / _mm_xor_si128) now come in different sizes: _mm512_mask_xor_epi64 (vpxorq) (each mask bit affects a 64bit element), or _mm512_mask_xor_epi32 (vpxord). The no-mask intrinsic _mm512_xor_si512 could compile to either vpxorq or vpxord; it doesn't matter.
Most AVX512 new instructions still use b/w/d/q in their mnemonics, though, like VPERMT2D (full permute selecting elements from two source vectors).
Currently there are no platforms that support Windows API but have unsigned short as not 16-bit.
If someone ever did make such a platform, the Windows API headers for that platform would not include the line typedef unsigned short WORD;.
You can think of MSDN pages as describing typical behaviour for MSVC++ on x86/x64 platforms.
The legacy for types like WORD predates Windows back to the days of MSDOS following the types defined by MASM (later the name was changed to ML). Not adopted by Windows API are MASM's signed types, such as SBYTE, SWORD, SDWORD, SQWORD.
QWORD / SQWORD in MASM probably wasn't defined until MASM / ML supported 80386.
A current reference:
http://msdn.microsoft.com/en-us/library/8t163bt0.aspx
Windows added types such as HANDLE, WCHAR, TCHAR, ... .
For Windows / Microsoft compilers, size_t is an unsigned integer the same size as a poitner, 32 bits if in 32 bit mode, 64 bits if in 64 bit mode.
The DB and DW data directives in MASM go back to the days of Intel's 8080 assembler.

Drawing a character in VGA memory with GNU C inline assembly

I´m learning to do some low level VGA programming in DOS with C and inline assembly. Right now I´m trying to create a function that prints out a character on screen.
This is my code:
//This is the characters BITMAPS
uint8_t characters[464] = {
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x20,0x20,0x20,0x20,0x00,0x20,0x00,0x50,
0x50,0x00,0x00,0x00,0x00,0x00,0x50,0xf8,0x50,0x50,0xf8,0x50,0x00,0x20,0xf8,0xa0,
0xf8,0x28,0xf8,0x00,0xc8,0xd0,0x20,0x20,0x58,0x98,0x00,0x40,0xa0,0x40,0xa8,0x90,
0x68,0x00,0x20,0x40,0x00,0x00,0x00,0x00,0x00,0x20,0x40,0x40,0x40,0x40,0x20,0x00,
0x20,0x10,0x10,0x10,0x10,0x20,0x00,0x50,0x20,0xf8,0x20,0x50,0x00,0x00,0x20,0x20,
0xf8,0x20,0x20,0x00,0x00,0x00,0x00,0x00,0x60,0x20,0x40,0x00,0x00,0x00,0xf8,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x60,0x60,0x00,0x00,0x08,0x10,0x20,0x40,0x80,
0x00,0x70,0x88,0x98,0xa8,0xc8,0x70,0x00,0x20,0x60,0x20,0x20,0x20,0x70,0x00,0x70,
0x88,0x08,0x70,0x80,0xf8,0x00,0xf8,0x10,0x30,0x08,0x88,0x70,0x00,0x20,0x40,0x90,
0x90,0xf8,0x10,0x00,0xf8,0x80,0xf0,0x08,0x88,0x70,0x00,0x70,0x80,0xf0,0x88,0x88,
0x70,0x00,0xf8,0x08,0x10,0x20,0x20,0x20,0x00,0x70,0x88,0x70,0x88,0x88,0x70,0x00,
0x70,0x88,0x88,0x78,0x08,0x70,0x00,0x30,0x30,0x00,0x00,0x30,0x30,0x00,0x30,0x30,
0x00,0x30,0x10,0x20,0x00,0x00,0x10,0x20,0x40,0x20,0x10,0x00,0x00,0xf8,0x00,0xf8,
0x00,0x00,0x00,0x00,0x20,0x10,0x08,0x10,0x20,0x00,0x70,0x88,0x10,0x20,0x00,0x20,
0x00,0x70,0x90,0xa8,0xb8,0x80,0x70,0x00,0x70,0x88,0x88,0xf8,0x88,0x88,0x00,0xf0,
0x88,0xf0,0x88,0x88,0xf0,0x00,0x70,0x88,0x80,0x80,0x88,0x70,0x00,0xe0,0x90,0x88,
0x88,0x90,0xe0,0x00,0xf8,0x80,0xf0,0x80,0x80,0xf8,0x00,0xf8,0x80,0xf0,0x80,0x80,
0x80,0x00,0x70,0x88,0x80,0x98,0x88,0x70,0x00,0x88,0x88,0xf8,0x88,0x88,0x88,0x00,
0x70,0x20,0x20,0x20,0x20,0x70,0x00,0x10,0x10,0x10,0x10,0x90,0x60,0x00,0x90,0xa0,
0xc0,0xa0,0x90,0x88,0x00,0x80,0x80,0x80,0x80,0x80,0xf8,0x00,0x88,0xd8,0xa8,0x88,
0x88,0x88,0x00,0x88,0xc8,0xa8,0x98,0x88,0x88,0x00,0x70,0x88,0x88,0x88,0x88,0x70,
0x00,0xf0,0x88,0x88,0xf0,0x80,0x80,0x00,0x70,0x88,0x88,0xa8,0x98,0x70,0x00,0xf0,
0x88,0x88,0xf0,0x90,0x88,0x00,0x70,0x80,0x70,0x08,0x88,0x70,0x00,0xf8,0x20,0x20,
0x20,0x20,0x20,0x00,0x88,0x88,0x88,0x88,0x88,0x70,0x00,0x88,0x88,0x88,0x88,0x50,
0x20,0x00,0x88,0x88,0x88,0xa8,0xa8,0x50,0x00,0x88,0x50,0x20,0x20,0x50,0x88,0x00,
0x88,0x50,0x20,0x20,0x20,0x20,0x00,0xf8,0x10,0x20,0x40,0x80,0xf8,0x00,0x60,0x40,
0x40,0x40,0x40,0x60,0x00,0x00,0x80,0x40,0x20,0x10,0x08,0x00,0x30,0x10,0x10,0x10,
0x10,0x30,0x00,0x20,0x50,0x88,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xf8,
0x00,0xf8,0xf8,0xf8,0xf8,0xf8,0xf8};
/**************************************************************************
* put_char *
* Print char *
**************************************************************************/
void put_char(int x ,int y,int ascii_char ,byte color){
__asm__(
"push %si\n\t"
"push %di\n\t"
"push %cx\n\t"
"mov color,%dl\n\t" //test color
"mov ascii_char,%al\n\t" //test char
"sub $32,%al\n\t"
"mov $7,%ah\n\t"
"mul %ah\n\t"
"lea $characters,%si\n\t"
"add %ax,%si\n\t"
"mov $7,%cl\n\t"
"0:\n\t"
"segCS %lodsb\n\t"
"mov $6,%ch\n\t"
"1:\n\t"
"shl $1,%al\n\t"
"jnc 2f\n\t"
"mov %dl,%ES:(%di)\n\t"
"2:\n\t"
"inc %di\n\t"
"dec %ch\n\t"
"jnz 1b\n\t"
"add $320-6,%di\n\t"
"dec %cl\n\t"
"jnz 0b\n\t"
"pop %cx\n\t"
"pop %di\n\t"
"pop %si\n\t"
"retn"
);
}
I´m guiding myself from this series of tutorials written in PASCAL: http://www.joco.homeserver.hu/vgalessons/lesson8.html .
I changed the assembly syntax according to the gcc compiler, but I´m still getting this errors:
Operand mismatch type for 'lea'
No such instruction 'segcs lodsb'
No such instruction 'retn'
EDIT:
I have been working on improving my code and at least now I see something on the screen. Here´s my updated code:
/**************************************************************************
* put_char *
* Print char *
**************************************************************************/
void put_char(int x,int y){
int char_offset;
int l,i,j,h,offset;
j,h,l,i=0;
offset = (y<<8) + (y<<6) + x;
__asm__(
"movl _VGA, %%ebx;" // VGA memory pointer
"addl %%ebx,%%edi;" //%di points to screen
"mov _ascii_char,%%al;"
"sub $32,%%al;"
"mov $7,%%ah;"
"mul %%ah;"
"lea _characters,%%si;"
"add %%ax,%%si;" //SI point to bitmap
"mov $7,%%cl;"
"0:;"
"lodsb %%cs:(%%si);" //load next byte of bitmap
"mov $6,%%ch;"
"1:;"
"shl $1,%%al;"
"jnc 2f;"
"movb %%dl,(%%edi);" //plot the pixel
"2:\n\t"
"incl %%edi;"
"dec %%ch;"
"jnz 1b;"
"addl $320-6,%%edi;"
"dec %%cl;"
"jnz 0b;"
: "=D" (offset)
: "d" (current_color)
);
}
If you see the image above I was trying to write the letter "S". The results are the green pixels that you see on the upper left side of the screen. No matter what x and y I give the functon it always plots the pixels on that same spot.
Can anyone help me correct my code?
See below for an analysis of some things that are specifically wrong with your put_char function, and a version that might work. (I'm not sure about the %cs segment override, but other than that it should do what you intend).
Learning DOS and 16-bit asm isn't the best way to learn asm
First of all, DOS and 16-bit x86 are thoroughly obsolete, and are not easier to learn than normal 64-bit x86. Even 32-bit x86 is obsolete, but still in wide use in the Windows world.
32-bit and 64-bit code don't have to care about a lot of 16-bit limitations / complications like segments or limited register choice in addressing modes. Some modern systems do use segment overrides for thread-local storage, but learning how to use segments in 16-bit code is barely connected to that.
One of the major benefits to knowing asm is for debugging / profiling / optimizing real programs. If you want to understand how to write C or other high-level code that can (and actually does) compile to efficient asm, you'll probably be looking at compiler output. This will be 64-bit (or 32-bit). (e.g. see Matt Godbolt's CppCon2017 talk: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” which has an excellent intro to reading x86 asm for total beginners, and to looking at compiler output).
Asm knowledge is useful when looking at performance-counter results annotating a disassembly of your binary (perf stat ./a.out && perf report -Mintel: see Chandler Carruth's CppCon2015 talk: "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!"). Aggressive compiler optimizations mean that looking at cycle / cache-miss / stall counts per source line are much less informative than per instruction.
Also, for your program to actually do anything, it has to either talk to hardware directly, or make system calls. Learning DOS system calls for file access and user input is a complete waste of time (except for answering the steady stream of SO questions about how to read and print multi-digit numbers in 16-bit code). They're quite different from the APIs in the current major OSes. Developing new DOS applications is not useful, so you'd have to learn another API (as well as ABI) when you get to the stage of doing something with your asm knowledge.
Learning asm on an 8086 simulator is even more limiting: 186, 286, and 386 added many convenient instructions like imul ecx, 15, making ax less "special". Limiting yourself to only instructions that work on 8086 means you'll figure out "bad" ways to do things. Other big ones are movzx / movsx, shift by an immediate count (other than 1), and push immediate. Besides performance, it's also easier to write code when these are available, because you don't have to write a loop to shift by more than 1 bit.
Suggestions for better ways to teach yourself asm
I mostly learned asm from reading compiler output, then making small changes. I didn't try to write stuff in asm when I didn't really understand things, but if you're going to learn quickly (rather than just evolve an understanding while debugging / profiling C), you probably need to test your understanding by writing your own code. You do need to understand the basics, that there are 8 or 16 integer registers + the flags and instruction pointer, and that every instruction makes a well-defined modification to the current architectural state of the machine. (See the Intel insn ref manual for complete descriptions of every instruction (links in the x86 wiki, along with much more good stuff).
You might want to start with simple things like writing a single function in asm, as part of a bigger program. Understanding the kind of asm needed to make system calls is useful, but in real programs it's normally only useful to hand-write asm for inner loops that don't involve any system calls. It's time-consuming to write asm to read input and print results, so I'd suggest doing that part in C. Make sure you read the compiler output and understand what's going on, and the difference between an integer and a string, and what strtol and printf do, even if you don't write them yourself.
Once you think you understand enough of the basics, find a function in some program you're familiar with and/or interested in, and see if you can beat the compiler and save instructions (or use faster instructions). Or implement it yourself without using the compiler output as a starting point, whichever you find more interesting. This answer might be interesting, although the focus there was finding C source that got the compiler to produce the optimal ASM.
How to try to solve your own problems (before asking an SO question)
There are many SO questions from people asking "how do I do X in asm", and the answer is usually "the same as you would in C". Don't get so caught up in asm being unfamiliar that you forget how to program. Figure out what needs to happen to the data the function operates on, then figure out how to do that in asm. If you get stuck and have to ask a question, you should have most of a working implementation, with just one part that you don't know what instructions to use for one step.
You should do this with 32 or 64bit x86. I'd suggest 64bit, since the ABI is nicer, but 32bit functions will force you to make more use of the stack. So that might help you understand how a call instruction puts the return address on the stack, and where the args the caller pushed actually are after that. (This appears to be what you tried to avoid dealing with by using inline asm).
Programming hardware directly is neat, but not a generally useful skill
Learning how to do graphics by directly modifying video RAM is not useful, other than to satisfy curiosity about how computers used to work. You can't use that knowledge for anything. Modern graphics APIs exist to let multiple programs draw in their own regions of the screen, and to allow indirection (e.g. draw on a texture instead of the screen directly, so 3D window-flipping alt-tab can look fancy). There too many reasons to list here for not drawing directly on video RAM.
Drawing on a pixmap buffer and then using a graphics API to copy it to the screen is possible. Still, doing bitmap graphics at all is more or less obsolete, unless you're generating images for PNG or JPEG or something (e.g. optimize converting histogram bins to a scatter plot in the back-end code for a web service). Modern graphics APIs abstract away the resolution, so your app can draw things at a reasonable size regardless of how big each pixel is. (small but extremely high rez screen vs. big TV at low rez).
It is kind of cool to write to memory and see something change on-screen. Or even better, hook up LEDs (with small resistors) to the data bits on a parallel port, and run an outb instruction to turn them on/off. I did this on my Linux system ages ago. I made a little wrapper program that used iopl(2) and inline asm, and ran it as root. You can probably do similar on Windows. You don't need DOS or 16bit code to get your feet wet talking to the hardware.
in/out instructions, and normal loads/stores to memory-mapped IO, and DMA, are how real drivers talk to hardware, including things far more complicated than parallel ports. It's fun to know how your hardware "really" works, but only spend time on it if you're actually interested, or want to write drivers. The Linux source tree includes drivers for boatloads of hardware, and is often well commented, so if you like reading code as much as writing code, that's another way to get a feel for what read drivers do when they talk to hardware.
It's generally good to have some idea how things work under the hood. If you want to learn about how graphics used to work ages ago (with VGA text mode and color / attribute bytes), then sure, go nuts. Just be aware that modern OSes don't use VGA text mode, so you aren't even learning what happens under the hood on modern computers.
Many people enjoy https://retrocomputing.stackexchange.com/, reliving a simpler time when computers were less complex and couldn't support as many layers of abstraction. Just be aware that's what you're doing. I might be a good stepping stone to learning to write drivers for modern hardware, if you're sure that's why you want to understand asm / hardware.
Inline asm
You are taking a totally incorrect approach to using inline ASM. You seem to want to write whole functions in asm, so you should just do that. e.g. put your code in asmfuncs.S or something. Use .S if you want to keep using GNU / AT&T syntax; or use .asm if you want to use Intel / NASM / YASM syntax (which I would recommend, since the official manuals all use Intel syntax. See the x86 wiki for guides and manuals.)
GNU inline asm is the hardest way to learn ASM. You have to understand everything that your asm does, and what the compiler needs to know about it. It's really hard to get everything right. For example, in your edit, that block of inline asm modifies many registers that you don't list as clobbered, including %ebx which is a call-preserved register (so this is broken even if that function isn't inlined). At least you took out the ret, so things won't break as spectacularly when the compiler inlines this function into the loop that calls it. If that sounds really complicated, that's because it is, and part of why you shouldn't use inline asm to learn asm.
This answer to a similar question from misusing inline asm while trying to learn asm in the first place has more links about inline asm and how to use it well.
Getting this mess working, maybe
This part could be a separate answer, but I'll leave it together.
Besides your whole approach being fundamentally a bad idea, there is at least one specific problem with your put_char function: you use offset as an output-only operand. gcc quite happily compiles your whole function to a single ret instruction, because the asm statement isn't volatile, and its output isn't used. (Inline asm statements without outputs are assumed to be volatile.)
I put your function on godbolt, so I could look at what assembly the compiler generates surrounding it. That link is to the fixed maybe-working version, with correctly-declared clobbers, comments, cleanups, and optimizations. See below for the same code, if that external link ever breaks.
I used gcc 5.3 with the -m16 option, which is different from using a real 16bit compiler. It still does everything the 32bit way (using 32bit addresses, 32bit ints, and 32bit function args on the stack), but tells the assembler that the CPU will be in 16bit mode, so it will know when to emit operand-size and address-size prefixes.
Even if you compile your original version with -O0, the compiler computes offset = (y<<8) + (y<<6) + x;, but doesn't put it in %edi, because you didn't ask it to. Specifying it as another input operand would have worked. After the inline asm, it stores %edi into -12(%ebp), where offset lives.
Other stuff wrong with put_char:
You pass 2 things (ascii_char and current_color) into your function through globals, instead of function arguments. Yuck, that's disgusting. VGA and characters are constants, so loading them from globals doesn't look so bad. Writing in asm means you should ignore good coding practices only when it helps performance by a reasonable amount. Since the caller probably had to store those values into the globals, you're not saving anything compared to the caller storing them on the stack as function args. And for x86-64, you'd be losing perf because the caller could just pass them in registers.
Also:
j,h,l,i=0; // sets i=0, does nothing to j, h, or l.
// gcc warns: left-hand operand of comma expression has no effect
j;h;l;i=0; // equivalent to this
j=h=l=i=0; // This is probably what you meant
All the local variables are unused anyway, other than offset. Were you going to write it in C or something?
You use 16bit addresses for characters, but 32bit addressing modes for VGA memory. I assume this is intentional, but I have no idea if it's correct. Also, are you sure you should use a CS: override for the loads from characters? Does the .rodata section go into the code segment? Although you didn't declare uint8_t characters[464] as const, so it's probably just in the .data section anyway. I consider myself fortunate that I haven't actually written code for a segmented memory model, but that still looks suspicious.
If you're really using djgpp, then according to Michael Petch's comment, your code will run in 32bit mode. Using 16bit addresses is thus a bad idea.
Optimizations
You can avoid using %ebx entirely by doing this, instead of loading into ebx and then adding %ebx to %edi.
"add _VGA, %%edi\n\t" // load from _VGA, add to edi.
You don't need lea to get an address into a register. You can just use
"mov %%ax, %%si\n\t"
"add $_characters, %%si\n\t"
$_characters means the address as an immediate constant. We can save a lot of instructions by combining this with the previous calculation of the offset into the characters array of bitmaps. The immediate-operand form of imul lets us produce the result in %si in the first place:
"movzbw _ascii_char,%%si\n\t"
//"sub $32,%%ax\n\t" // AX = ascii_char - 32
"imul $7, %%si, %%si\n\t"
"add $(_characters - 32*7), %%si\n\t" // Do the -32 at the same time as adding the table address, after multiplying
// SI points to characters[(ascii_char-32)*7]
// i.e. the start of the bitmap for the current ascii character.
Since this form of imul only keeps the low 16b of the 16*16 -> 32b multiply, the 2 and 3 operand forms imul can be used for signed or unsigned multiplies, which is why only imul (not mul) has those extra forms. For larger operand-size multiplies, 2 and 3 operand imul is faster, because it doesn't have to store the high half in %[er]dx.
You could simplify the inner loop a bit, but it would complicate the outer loop slightly: you could branch on the zero flag, as set by shl $1, %al, instead of using a counter. That would make it also unpredictable, like the jump over store for non-foreground pixels, so the increased branch mispredictions might be worse than the extra do-nothing loops. It would also mean you'd need to recalculate %edi in the outer loop each time, because the inner loop wouldn't run a constant number of times. But it could look like:
... same first part of the loop as before
// re-initialize %edi to first_pixel-1, based on outer-loop counter
"lea -1(%%edi), %%ebx\n"
".Lbit_loop:\n\t" // map the 1bpp bitmap to 8bpp VGA memory
"incl %%ebx\n\t" // inc before shift, to preserve flags
"shl $1,%%al\n\t"
"jnc .Lskip_store\n\t" // transparency: only store on foreground pixels
"movb %%dl,(%%ebx)\n" //plot the pixel
".Lskip_store:\n\t"
"jnz .Lbit_loop\n\t" // flags still set from shl
"addl $320,%%edi\n\t" // WITHOUT the -6
"dec %%cl\n\t"
"jnz .Lbyte_loop\n\t"
Note that the bits in your character bitmaps are going to map to bytes in VGA memory like {7 6 5 4 3 2 1 0}, because you're testing the bit shifted out by a left shift. So it starts with the MSB. Bits in a register are always "big endian". A left shift multiplies by two, even on a little-endian machine like x86. Little-endian only affects ordering of bytes in memory, not bits in a byte, and not even bytes inside registers.
A version of your function that might do what you intended.
This is the same as the godbolt link.
void put_char(int x,int y){
int offset = (y<<8) + (y<<6) + x;
__asm__ volatile ( // volatile is implicit for asm statements with no outputs, but better safe than sorry.
"add _VGA, %%edi\n\t" // edi points to VGA + offset.
"movzbw _ascii_char,%%si\n\t" // Better: use an input operand
//"sub $32,%%ax\n\t" // AX = ascii_char - 32
"imul $7, %%si, %%si\n\t" // can't fold the load into this because it's not zero-padded
"add $(_characters - 32*7), %%si\n\t" // Do the -32 at the same time as adding the table address, after multiplying
// SI points to characters[(ascii_char-32)*7]
// i.e. the start of the bitmap for the current ascii character.
"mov $7,%%cl\n"
".Lbyte_loop:\n\t"
"lodsb %%cs:(%%si)\n\t" //load next byte of bitmap
"mov $6,%%ch\n"
".Lbit_loop:\n\t" // map the 1bpp bitmap to 8bpp VGA memory
"shl $1,%%al\n\t"
"jnc .Lskip_store\n\t" // transparency: only store on foreground pixels
"movb %%dl,(%%edi)\n" //plot the pixel
".Lskip_store:\n\t"
"incl %%edi\n\t"
"dec %%ch\n\t"
"jnz .Lbit_loop\n\t"
"addl $320-6,%%edi\n\t"
"dec %%cl\n\t"
"jnz .Lbyte_loop\n\t"
: "+&D" (offset) // EDI modified by the asm, compiler needs to know that, so use a read-write "+" input. Early-clobber "&" because we read the other input after modifying this.
: "d" (current_color) // used read-only
: "%eax", "%ecx", "%esi", "memory"
// omit the memory clobber if your C never touches VGA memory, and your asm never loads/stores anywhere else.
// but that's not the case here: the asm loads from memory written by C
// without listing it as a memory operand (even a pointer in a register isn't sufficient)
// so gcc might optimize away "dead" stores to it, or reorder the asm with loads/stores to it.
);
}
Re: the "memory" clobber, see How can I indicate that the memory *pointed* to by an inline ASM argument may be used?
I didn't use dummy output operands to leave register allocation up to the compiler's discretion, but that's a good idea to reduce the overhead of getting data in the right places for inline asm. (extra mov instructions). For example, here there was no need to force the compiler to put offset in %edi. It could have been any register we aren't already using.

Which is faster: while(1) or while(2)?

This was an interview question asked by a senior manager.
Which is faster?
while(1) {
// Some code
}
or
while(2) {
//Some code
}
I said that both have the same execution speed, as the expression inside while should finally evaluate to true or false. In this case, both evaluate to true and there are no extra conditional instructions inside the while condition. So, both will have the same speed of execution and I prefer while (1).
But the interviewer said confidently:
"Check your basics. while(1) is faster than while(2)."
(He was not testing my confidence)
Is this true?
See also: Is "for(;;)" faster than "while (TRUE)"? If not, why do people use it?
Both loops are infinite, but we can see which one takes more instructions/resources per iteration.
Using gcc, I compiled the two following programs to assembly at varying levels of optimization:
int main(void) {
while(1) {}
return 0;
}
int main(void) {
while(2) {}
return 0;
}
Even with no optimizations (-O0), the generated assembly was identical for both programs. Therefore, there is no speed difference between the two loops.
For reference, here is the generated assembly (using gcc main.c -S -masm=intel with an optimization flag):
With -O0:
.file "main.c"
.intel_syntax noprefix
.def __main; .scl 2; .type 32; .endef
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
push rbp
.seh_pushreg rbp
mov rbp, rsp
.seh_setframe rbp, 0
sub rsp, 32
.seh_stackalloc 32
.seh_endprologue
call __main
.L2:
jmp .L2
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
With -O1:
.file "main.c"
.intel_syntax noprefix
.def __main; .scl 2; .type 32; .endef
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
call __main
.L2:
jmp .L2
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
With -O2 and -O3 (same output):
.file "main.c"
.intel_syntax noprefix
.def __main; .scl 2; .type 32; .endef
.section .text.startup,"x"
.p2align 4,,15
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
sub rsp, 40
.seh_stackalloc 40
.seh_endprologue
call __main
.L2:
jmp .L2
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
In fact, the assembly generated for the loop is identical for every level of optimization:
.L2:
jmp .L2
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
The important bits being:
.L2:
jmp .L2
I can't read assembly very well, but this is obviously an unconditional loop. The jmp instruction unconditionally resets the program back to the .L2 label without even comparing a value against true, and of course immediately does so again until the program is somehow ended. This directly corresponds to the C/C++ code:
L2:
goto L2;
Edit:
Interestingly enough, even with no optimizations, the following loops all produced the exact same output (unconditional jmp) in assembly:
while(42) {}
while(1==1) {}
while(2==2) {}
while(4<7) {}
while(3==3 && 4==4) {}
while(8-9 < 0) {}
while(4.3 * 3e4 >= 2 << 6) {}
while(-0.1 + 02) {}
And even to my amazement:
#include<math.h>
while(sqrt(7)) {}
while(hypot(3,4)) {}
Things get a little more interesting with user-defined functions:
int x(void) {
return 1;
}
while(x()) {}
#include<math.h>
double x(void) {
return sqrt(7);
}
while(x()) {}
At -O0, these two examples actually call x and perform a comparison for each iteration.
First example (returning 1):
.L4:
call x
testl %eax, %eax
jne .L4
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
Second example (returning sqrt(7)):
.L4:
call x
xorpd %xmm1, %xmm1
ucomisd %xmm1, %xmm0
jp .L4
xorpd %xmm1, %xmm1
ucomisd %xmm1, %xmm0
jne .L4
movl $0, %eax
addq $32, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (tdm64-2) 4.8.1"
However, at -O1 and above, they both produce the same assembly as the previous examples (an unconditional jmp back to the preceding label).
TL;DR
Under GCC, the different loops are compiled to identical assembly. The compiler evaluates the constant values and doesn't bother performing any actual comparison.
The moral of the story is:
There exists a layer of translation between C source code and CPU instructions, and this layer has important implications for performance.
Therefore, performance cannot be evaluated by only looking at source code.
The compiler should be smart enough to optimize such trivial cases. Programmers should not waste their time thinking about them in the vast majority of cases.
Yes, while(1) is much faster than while(2), for a human to read! If I see while(1) in an unfamiliar codebase, I immediately know what the author intended, and my eyeballs can continue to the next line.
If I see while(2), I'll probably halt in my tracks and try to figure out why the author didn't write while(1). Did the author's finger slip on the keyboard? Do the maintainers of this codebase use while(n) as an obscure commenting mechanism to make loops look different? Is it a crude workaround for a spurious warning in some broken static analysis tool? Or is this a clue that I'm reading generated code? Is it a bug resulting from an ill-advised find-and-replace-all, or a bad merge, or a cosmic ray? Maybe this line of code is supposed to do something dramatically different. Maybe it was supposed to read while(w) or while(x2). I'd better find the author in the file's history and send them a "WTF" email... and now I've broken my mental context. The while(2) might consume several minutes of my time, when while(1) would have taken a fraction of a second!
I'm exaggerating, but only a little. Code readability is really important. And that's worth mentioning in an interview!
The existing answers showing the code generated by a particular compiler for a particular target with a particular set of options do not fully answer the question -- unless the question was asked in that specific context ("Which is faster using gcc 4.7.2 for x86_64 with default options?", for example).
As far as the language definition is concerned, in the abstract machine while (1) evaluates the integer constant 1, and while (2) evaluates the integer constant 2; in both cases the result is compared for equality to zero. The language standard says absolutely nothing about the relative performance of the two constructs.
I can imagine that an extremely naive compiler might generate different machine code for the two forms, at least when compiled without requesting optimization.
On the other hand, C compilers absolutely must evaluate some constant expressions at compile time, when they appear in contexts that require a constant expression. For example, this:
int n = 4;
switch (n) {
case 2+2: break;
case 4: break;
}
requires a diagnostic; a lazy compiler does not have the option of deferring the evaluation of 2+2 until execution time. Since a compiler has to have the ability to evaluate constant expressions at compile time, there's no good reason for it not to take advantage of that capability even when it's not required.
The C standard (N1570 6.8.5p4) says that
An iteration statement causes a statement called the loop body to be
executed repeatedly until the controlling expression compares equal to
0.
So the relevant constant expressions are 1 == 0 and 2 == 0, both of which evaluate to the int value 0. (These comparison are implicit in the semantics of the while loop; they don't exist as actual C expressions.)
A perversely naive compiler could generate different code for the two constructs. For example, for the first it could generate an unconditional infinite loop (treating 1 as a special case), and for the second it could generate an explicit run-time comparison equivalent to 2 != 0. But I've never encountered a C compiler that would actually behave that way, and I seriously doubt that such a compiler exists.
Most compilers (I'm tempted to say all production-quality compilers) have options to request additional optimizations. Under such an option, it's even less likely that any compiler would generate different code for the two forms.
If your compiler generates different code for the two constructs, first check whether the differing code sequences actually have different performance. If they do, try compiling again with an optimization option (if available). If they still differ, submit a bug report to the compiler vendor. It's not (necessarily) a bug in the sense of a failure to conform to the C standard, but it's almost certainly a problem that should be corrected.
Bottom line: while (1) and while(2) almost certainly have the same performance. They have exactly the same semantics, and there's no good reason for any compiler not to generate identical code.
And though it's perfectly legal for a compiler to generate faster code for while(1) than for while(2), it's equally legal for a compiler to generate faster code for while(1) than for another occurrence of while(1) in the same program.
(There's another question implicit in the one you asked: How do you deal with an interviewer who insists on an incorrect technical point. That would probably be a good question for the Workplace site).
Wait a minute. The interviewer, did he look like this guy?
It's bad enough that the interviewer himself has failed this interview,
what if other programmers at this company have "passed" this test?
No. Evaluating the statements 1 == 0 and 2 == 0 should be equally fast. We could imagine poor compiler implementations where one might be faster than the other. But there's no good reason why one should be faster than the other.
Even if there's some obscure circumstance when the claim would be true, programmers should not be evaluated based on knowledge of obscure (and in this case, creepy) trivia. Don't worry about this interview, the best move here is to walk away.
Disclaimer: This is NOT an original Dilbert cartoon. This is merely a mashup.
Your explanation is correct. This seems to be a question that tests your self-confidence in addition to technical knowledge.
By the way, if you answered
Both pieces of code are equally fast, because both take infinite time to complete
the interviewer would say
But while (1) can do more iterations per second; can you explain why? (this is nonsense; testing your confidence again)
So by answering like you did, you saved some time which you would otherwise waste on discussing this bad question.
Here is an example code generated by the compiler on my system (MS Visual Studio 2012), with optimizations turned off:
yyy:
xor eax, eax
cmp eax, 1 (or 2, depending on your code)
je xxx
jmp yyy
xxx:
...
With optimizations turned on:
xxx:
jmp xxx
So the generated code is exactly the same, at least with an optimizing compiler.
The most likely explanation for the question is that the interviewer thinks that the processor checks the individual bits of the numbers, one by one, until it hits a non-zero value:
1 = 00000001
2 = 00000010
If the "is zero?" algorithm starts from the right side of the number and has to check each bit until it reaches a non-zero bit, the while(1) { } loop would have to check twice as many bits per iteration as the while(2) { } loop.
This requires a very wrong mental model of how computers work, but it does have its own internal logic. One way to check would be to ask if while(-1) { } or while(3) { } would be equally fast, or if while(32) { } would be even slower.
Of course I do not know the real intentions of this manager, but I propose a completely different view: When hiring a new member into a team, it is useful to know how he reacts to conflict situations.
They drove you into conflict. If this is true, they are clever and the question was good. For some industries, like banking, posting your problem to Stack Overflow could be a reason for rejection.
But of course I do not know, I just propose one option.
I think the clue is to be found in "asked by a senior manager". This person obviously stopped programming when he became a manager and then it took him/her several years to become a senior manager. Never lost interest in programming, but never wrote a line since those days. So his reference is not "any decent compiler out there" as some answers mention, but "the compiler this person worked with 20-30 years ago".
At that time, programmers spent a considerable percentage of their time trying out various methods for making their code faster and more efficient as CPU time of 'the central minicomputer' was so valueable. As did people writing compilers. I'm guessing that the one-and-only compiler his company made available at that time optimized on the basis of 'frequently encountered statements that can be optimized' and took a bit of a shortcut when encountering a while(1) and evaluated everything else, including a while(2). Having had such an experience could explain his position and his confidence in it.
The best approach to get you hired is probably one that enables the senior manager to get carried away and lecture you 2-3 minutes on "the good old days of programming" before YOU smoothly lead him towards the next interview subject. (Good timing is important here - too fast and you're interrupting the story - too slow and you are labelled as somebody with insufficient focus). Do tell him at the end of the interview that you'd be highly interested to learn more about this topic.
You should have asked him how did he reached to that conclusion. Under any decent compiler out there, the two compile to the same asm instructions. So, he should have told you the compiler as well to start off. And even so, you would have to know the compiler and platform very well to even make a theoretical educated guess. And in the end, it doesn't really matter in practice, since there are other external factors like memory fragmentation or system load that will influence the loop more than this detail.
For the sake of this question, I should that add I remember Doug Gwyn from C Committee writing that some early C compilers without the optimizer pass would generate a test in assembly for the while(1) (comparing to for(;;) which wouldn't have it).
I would answer to the interviewer by giving this historical note and then say that even if I would be very surprised any compiler did this, a compiler could have:
without optimizer pass the compiler generate a test for both while(1) and while(2)
with optimizer pass the compiler is instructed to optimize (with an unconditional jump) all while(1) because they are considered as idiomatic. This would leave the while(2) with a test and therefore makes a performance difference between the two.
I would of course add to the interviewer that not considering while(1) and while(2) the same construct is a sign of low-quality optimization as these are equivalent constructs.
Another take on such a question would be to see if you got courage to tell your manager that he/she is wrong! And how softly you can communicate it.
My first instinct would have been to generate assembly output to show the manager that any decent compiler should take care of it, and if it's not doing so, you will submit the next patch for it :)
To see so many people delve into this problem, shows exactly why this could very well be a test to see how quickly you want to micro-optimize things.
My answer would be; it doesn't matter that much, I rather focus on the business problem which we are solving. After all, that's what I'm going to be paid for.
Moreover, I would opt for while(1) {} because it is more common, and other teammates would not need to spend time to figure out why someone would go for a higher number than 1.
Now go write some code. ;-)
If you're that worried about optimisation, you should use
for (;;)
because that has no tests. (cynic mode)
It seems to me this is one of those behavioral interview questions masked as a technical question. Some companies do this - they will ask a technical question that should be fairly easy for any competent programmer to answer, but when the interviewee gives the correct answer, the interviewer will tell them they are wrong.
The company wants to see how you will react in this situation. Do you sit there quietly and don't push that your answer is correct, due to either self-doubt or fear of upsetting the interviewer? Or are you willing to challenge a person in authority who you know is wrong? They want to see if you are willing to stand up for your convictions, and if you can do it in a tactful and respectful manner.
Here's a problem: If you actually write a program and measure its speed, the speed of both loops could be different! For some reasonable comparison:
unsigned long i = 0;
while (1) { if (++i == 1000000000) break; }
unsigned long i = 0;
while (2) { if (++i == 1000000000) break; }
with some code added that prints the time, some random effect like how the loop is positioned within one or two cache lines could make a difference. One loop might by pure chance be completely within one cache line, or at the start of a cache line, or it might to straddle two cache lines. And as a result, whatever the interviewer claims is fastest might actually be fastest - by coincidence.
Worst case scenario: An optimising compiler doesn't figure out what the loop does, but figures out that the values produced when the second loop is executed are the same ones as produced by the first one. And generate full code for the first loop, but not for the second.
I used to program C and Assembly code back when this sort of nonsense might have made a difference. When it did make a difference we wrote it in Assembly.
If I were asked that question I would have repeated Donald Knuth's famous 1974 quote about premature optimization and walked if the interviewer didn't laugh and move on.
Maybe the interviewer posed such dumb question intentionally and wanted you to make 3 points:
Basic reasoning. Both loops are infinite, it's hard to talk about performance.
Knowledge about optimisation levels. He wanted to hear from you if you let the compiler do any optimisation for you, it would optimise the condition, especially if the block was not empty.
Knowledge about microprocessor architecture. Most architectures have a special CPU instruction for comparision with 0 (while not necessarily faster).
They are both equal - the same.
According to the specifications anything that is not 0 is considered true, so even without any optimization, and a good compiler will not generate any code
for while(1) or while(2). The compiler would generate a simple check for != 0.
Judging by the amount of time and effort people have spent testing, proving, and answering this very straight forward question I'd say that both were made very slow by asking the question.
And so to spend even more time on it...
while (2) is ridiculous, because,
while (1), and while (true) are historically used to make an infinite loop which expects break to be called at some stage inside the loop based upon a condition that will certainly occur.
The 1 is simply there to always evaluate to true and therefore, to say while (2) is about as silly as saying while (1 + 1 == 2) which will also evaluate to true.
And if you want to be completely silly just use: -
while (1 + 5 - 2 - (1 * 3) == 0.5 - 4 + ((9 * 2) / 4.0)) {
if (succeed())
break;
}
I think that the interviewer made a typo which did not effect the running of the code, but if he intentionally used the 2 just to be weird then sack him before he puts weird statements all through your code making it difficult to read and work with.
That depends on the compiler.
If it optimizes the code, or if it evaluates 1 and 2 to true with the same number of instructions for a particular instruction set, the execution speed will be the same.
In real cases it will always be equally fast, but it would be possible to imagine a particular compiler and a particular system when this would be evaluated differently.
I mean: this is not really a language (C) related question.
Since people looking to answer this question want the fastest loop, I would have answered that both are equally compiling into the same assembly code, as stated in the other answers. Nevertheless you can suggest to the interviewer using 'loop unrolling'; a do {} while loop instead of the while loop.
Cautious: You need to ensure that the loop would at least always run once.
The loop should have a break condition inside.
Also for that kind of loop I would personally prefer the use of do {} while(42) since any integer, except 0, would do the job.
The obvious answer is: as posted, both fragments would run an equally busy infinite loop, which makes the program infinitely slow.
Although redefining C keywords as macros would technically have undefined behavior, it is the only way I can think of to make either code fragment fast at all: you can add this line above the 2 fragments:
#define while(x) sleep(x);
it will indeed make while(1) twice as fast (or half as slow) as while(2).
The only reason I can think of why the while(2) would be any slower is:
The code optimizes the loop to
cmp eax, 2
When the subtract occurs you're essentially subtracting
a. 00000000 - 00000010 cmp eax, 2
instead of
b. 00000000 - 00000001 cmp eax, 1
cmp only sets flags and does not set a result. So on the least significant bits we know if we need to borrow or not with b. Whereas with a you have to perform two subtractions before you get a borrow.

The machine code in LC-3 are also known as assembly?

I'm a little confused over the concept of machine code...
Is machine code synonymous to assembly language?
What would be an example of the machine code in LC-3?
Assembly instructions (LD, ST, ADD, etc. in the case of the LC-3 simulator) correspond to binary instructions that are loaded and executed as a program. In the case of the LC-3, these "opcodes" are assembled into 16-bit strings of 1s and 0s that the LC-3 architecture is designed to execute accordingly.
For example, the assembly "ADD R4 R4 #10" corresponds to the LC-3 "machine code":
0001100100101010
Which can be broken down as:
0001 - ADD.
100 - 4 in binary
100 - 4 in binary
10 - indicates that we are adding a value to a register
1010 - 10 in binary
Note that each opcode has a distinct binary equivalent, so there are 2^4=16 possible opcodes.
The LC-3 sneakily deals with this problem by introducing those flag bits in certain instructions. For ADD, those two bits change depending on what we're adding. For example, if we are adding two registers (ie. "ADD R4 R4 R7" as opposed to a register and a value) the bits would be set to 01 instead of 10.
This machine code instructs the LC-3 to add decimal 10 to the value in register 4, and store the result in register 4.
An "Assembly language" is a symbolic (human-readable) language, which a program known as the "assembler" translates into the binary form, "machine code", which the CPU can load and execute. In LC-3, each instruction in machine code is a 16-bit word, while the corresponding instruction in the assembly language is a line of human-readable text (other architectures may have longer or shorter machine-code instructions, but the general concept it the same).
The above stands true for any high level language (such as C, Pascal, or Basic). The differece between HLL and assembly is that each assembly language statement corresponds to one machine operation (macros excepted). Meanwhile, in a HLL, a single statement can compile into a whole lot of machine code.
You can say that assembly is a thin layer of mnemonics on top of machine code.

Resources