Related
I have the following assembly code:
global stuff
stuff:
;do stuff
I wish to call this from C code, so would it be able to be called from a C program which contains it in _asm()?
Just because Linux does it is not a reason to do something. I am baffled by the use/desire to use inline assembly, but yes, sure, there are ways to do this which you could easily figure out. If you are asking this you are not quite ready to write an operating system. Keep working on it though. You need C and tool basics before you begin. An operating system is essentially a big bare-metal program.
If you tagged this nasm then you are not interested in inline asm anyway, just use real asm gas or nasm.
This
int fun ( void )
{
return 5;
}
does/can become:
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3
so that means I can do this
.globl fun
fun:
mov $0x5,%eax
retq
and this
#include <stdio.h>
int fun ( void );
int main ( void )
{
printf("%d\n",fun());
return 0;
}
and build a binary linking the two parts which prints 5 when run.
So then with nasm I can
global fun
fun:
mov eax,5
ret
confirming it is the same machine code in this case or at least an equivalent.
0000000000000000 <fun>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3 retq
so I can link that in instead and it prints 5 as well.
So now I can do a simple inline, very real asm like, perhaps what you were asking
#include <stdio.h>
int fun ( void );
asm(".globl fun ; fun: mov $0x5,%eax ; retq");
int main ( void )
{
printf("%d\n",fun());
return 0;
}
This was using gcc, inline asm is tool specific and not assumed to be portable.
And now you can grossly over complicate it from there.
Using an abstraction to perform I/O operations ("its basically writing byte x to port y") in an OS (or anywhere) is absolutely the right thing to do (you do not want to inline something like that), so a separate function be it real asm or C or some hybrid is a good idea worth pursuing. At the end of the day though for an access type function like that you need to be in complete control over the instruction used so however you choose to do that is up to you. But elementary use of tools and the language is a required before starting any kind of work like this. You can examine different operating systems that exist now as a reference, but this is yours not theirs, your personal preferences not someone else's, your knowledge of the language and tools and assumptions not someone else's. They may have a system level implementation of something that you may not see all of and can fall into traps by simply copying a piece here or there.
If in C I write:
int num;
Before I assign anything to num, is the value of num indeterminate?
Static variables (file scope and function static) are initialized to zero:
int x; // zero
int y = 0; // also zero
void foo() {
static int x; // also zero
}
Non-static variables (local variables) are indeterminate. Reading them prior to assigning a value results in undefined behavior.
void foo() {
int x;
printf("%d", x); // the compiler is free to crash here
}
In practice, they tend to just have some nonsensical value in there initially - some compilers may even put in specific, fixed values to make it obvious when looking in a debugger - but strictly speaking, the compiler is free to do anything from crashing to summoning demons through your nasal passages.
As for why it's undefined behavior instead of simply "undefined/arbitrary value", there are a number of CPU architectures that have additional flag bits in their representation for various types. A modern example would be the Itanium, which has a "Not a Thing" bit in its registers; of course, the C standard drafters were considering some older architectures.
Attempting to work with a value with these flag bits set can result in a CPU exception in an operation that really shouldn't fail (eg, integer addition, or assigning to another variable). And if you go and leave a variable uninitialized, the compiler might pick up some random garbage with these flag bits set - meaning touching that uninitialized variable may be deadly.
0 if static or global, indeterminate if storage class is auto
C has always been very specific about the initial values of objects. If global or static, they will be zeroed. If auto, the value is indeterminate.
This was the case in pre-C89 compilers and was so specified by K&R and in DMR's original C report.
This was the case in C89, see section 6.5.7 Initialization.
If an object that has automatic
storage duration is not initialized
explicitely, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitely, it is
initialized implicitely as if every
member that has arithmetic type were
assigned 0 and every member that has
pointer type were assigned a null
pointer constant.
This was the case in C99, see section 6.7.8 Initialization.
If an object that has automatic
storage duration is not initialized
explicitly, its value is
indeterminate. If an object that has
static storage duration is not
initialized explicitly, then: — if it
has pointer type, it is initialized to
a null pointer; — if it has arithmetic
type, it is initialized to (positive
or unsigned) zero; — if it is an
aggregate, every member is initialized
(recursively) according to these
rules; — if it is a union, the first
named member is initialized
(recursively) according to these
rules.
As to what exactly indeterminate means, I'm not sure for C89, C99 says:
3.17.2 indeterminate valueeither an unspecified value or a trap
representation
But regardless of what standards say, in real life, each stack page actually does start off as zero, but when your program looks at any auto storage class values, it sees whatever was left behind by your own program when it last used those stack addresses. If you allocate a lot of auto arrays you will see them eventually start neatly with zeroes.
You might wonder, why is it this way? A different SO answer deals with that question, see: https://stackoverflow.com/a/2091505/140740
It depends on the storage duration of the variable. A variable with static storage duration is always implicitly initialized with zero.
As for automatic (local) variables, an uninitialized variable has indeterminate value. Indeterminate value, among other things, mean that whatever "value" you might "see" in that variable is not only unpredictable, it is not even guaranteed to be stable. For example, in practice (i.e. ignoring the UB for a second) this code
int num;
int a = num;
int b = num;
does not guarantee that variables a and b will receive identical values. Interestingly, this is not some pedantic theoretical concept, this readily happens in practice as consequence of optimization.
So in general, the popular answer that "it is initialized with whatever garbage was in memory" is not even remotely correct. Uninitialized variable's behavior is different from that of a variable initialized with garbage.
Ubuntu 15.10, Kernel 4.2.0, x86-64, GCC 5.2.1 example
Enough standards, let's look at an implementation :-)
Local variable
Standards: undefined behavior.
Implementation: the program allocates stack space, and never moves anything to that address, so whatever was there previously is used.
#include <stdio.h>
int main() {
int i;
printf("%d\n", i);
}
compile with:
gcc -O0 -std=c99 a.c
outputs:
0
and decompiles with:
objdump -dr a.out
to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 48 83 ec 10 sub $0x10,%rsp
40053e: 8b 45 fc mov -0x4(%rbp),%eax
400541: 89 c6 mov %eax,%esi
400543: bf e4 05 40 00 mov $0x4005e4,%edi
400548: b8 00 00 00 00 mov $0x0,%eax
40054d: e8 be fe ff ff callq 400410 <printf#plt>
400552: b8 00 00 00 00 mov $0x0,%eax
400557: c9 leaveq
400558: c3 retq
From our knowledge of x86-64 calling conventions:
%rdi is the first printf argument, thus the string "%d\n" at address 0x4005e4
%rsi is the second printf argument, thus i.
It comes from -0x4(%rbp), which is the first 4-byte local variable.
At this point, rbp is in the first page of the stack has been allocated by the kernel, so to understand that value we would to look into the kernel code and find out what it sets that to.
TODO does the kernel set that memory to something before reusing it for other processes when a process dies? If not, the new process would be able to read the memory of other finished programs, leaking data. See: Are uninitialized values ever a security risk?
We can then also play with our own stack modifications and write fun things like:
#include <assert.h>
int f() {
int i = 13;
return i;
}
int g() {
int i;
return i;
}
int main() {
f();
assert(g() == 13);
}
Note that GCC 11 seems to produce a different assembly output, and the above code stops "working", it is undefined behavior after all: Why does -O3 in gcc seem to initialize my local variable to 0, while -O0 does not?
Local variable in -O3
Implementation analysis at: What does <value optimized out> mean in gdb?
Global variables
Standards: 0
Implementation: .bss section.
#include <stdio.h>
int i;
int main() {
printf("%d\n", i);
}
gcc -O0 -std=c99 a.c
compiles to:
0000000000400536 <main>:
400536: 55 push %rbp
400537: 48 89 e5 mov %rsp,%rbp
40053a: 8b 05 04 0b 20 00 mov 0x200b04(%rip),%eax # 601044 <i>
400540: 89 c6 mov %eax,%esi
400542: bf e4 05 40 00 mov $0x4005e4,%edi
400547: b8 00 00 00 00 mov $0x0,%eax
40054c: e8 bf fe ff ff callq 400410 <printf#plt>
400551: b8 00 00 00 00 mov $0x0,%eax
400556: 5d pop %rbp
400557: c3 retq
400558: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
40055f: 00
# 601044 <i> says that i is at address 0x601044 and:
readelf -SW a.out
contains:
[25] .bss NOBITS 0000000000601040 001040 000008 00 WA 0 0 4
which says 0x601044 is right in the middle of the .bss section, which starts at 0x601040 and is 8 bytes long.
The ELF standard then guarantees that the section named .bss is completely filled with of zeros:
.bss This section holds uninitialized data that contribute to the
program’s memory image. By definition, the system initializes the
data with zeros when the program begins to run. The section occu-
pies no file space, as indicated by the section type, SHT_NOBITS.
Furthermore, the type SHT_NOBITS is efficient and occupies no space on the executable file:
sh_size This member gives the section’s size in bytes. Unless the sec-
tion type is SHT_NOBITS , the section occupies sh_size
bytes in the file. A section of type SHT_NOBITS may have a non-zero
size, but it occupies no space in the file.
Then it is up to the Linux kernel to zero out that memory region when loading the program into memory when it gets started.
That depends. If that definition is global (outside any function) then num will be initialized to zero. If it's local (inside a function) then its value is indeterminate. In theory, even attempting to read the value has undefined behavior -- C allows for the possibility of bits that don't contribute to the value, but have to be set in specific ways for you to even get defined results from reading the variable.
The basic answer is, yes it is undefined.
If you are seeing odd behavior because of this, it may depended on where it is declared. If within a function on the stack then the contents will more than likely be different every time the function gets called. If it is a static or module scope it is undefined but will not change.
Because computers have finite storage capacity, automatic variables will typically be held in storage elements (whether registers or RAM) that have previously been used for some other arbitrary purpose. If a such a variable is used before a value has been assigned to it, that storage may hold whatever it held previously, and so the contents of the variable will be unpredictable.
As an additional wrinkle, many compilers may keep variables in registers which are larger than the associated types. Although a compiler would be required to ensure that any value which is written to a variable and read back will be truncated and/or sign-extended to its proper size, many compilers will perform such truncation when variables are written and expect that it will have been performed before the variable is read. On such compilers, something like:
uint16_t hey(uint32_t x, uint32_t mode)
{ uint16_t q;
if (mode==1) q=2;
if (mode==3) q=4;
return q; }
uint32_t wow(uint32_t mode) {
return hey(1234567, mode);
}
might very well result in wow() storing the values 1234567 into registers
0 and 1, respectively, and calling foo(). Since x isn't needed within
"foo", and since functions are supposed to put their return value into
register 0, the compiler may allocate register 0 to q. If mode is 1 or
3, register 0 will be loaded with 2 or 4, respectively, but if it is some
other value, the function may return whatever was in register 0 (i.e. the
value 1234567) even though that value is not within the range of uint16_t.
To avoid requiring compilers to do extra work to ensure that uninitialized
variables never seem to hold values outside their domain, and avoid needing
to specify indeterminate behaviors in excessive detail, the Standard says
that use of uninitialized automatic variables is Undefined Behavior. In
some cases, the consequences of this may be even more surprising than a
value being outside the range of its type. For example, given:
void moo(int mode)
{
if (mode < 5)
launch_nukes();
hey(0, mode);
}
a compiler could infer that because invoking moo() with a mode which is
greater than 3 will inevitably lead to the program invoking Undefined
Behavior, the compiler may omit any code which would only be relevant
if mode is 4 or greater, such as the code which would normally prevent
the launch of nukes in such cases. Note that neither the Standard, nor
modern compiler philosophy, would care about the fact that the return value
from "hey" is ignored--the act of trying to return it gives a compiler
unlimited license to generate arbitrary code.
If storage class is static or global then during loading, the BSS initialises the variable or memory location(ML) to 0 unless the variable is initially assigned some value. In case of local uninitialized variables the trap representation is assigned to memory location. So if any of your registers containing important info is overwritten by compiler the program may crash.
but some compilers may have mechanism to avoid such a problem.
I was working with nec v850 series when i realised There is trap representation which has bit patterns that represent undefined values for data types except for char. When i took a uninitialized char i got a zero default value due to trap representation. This might be useful for any1 using necv850es
As far as i had gone it is mostly depend on compiler but in general most cases the value is pre assumed as 0 by the compliers.
I got garbage value in case of VC++ while TC gave value as 0.
I Print it like below
int i;
printf('%d',i);
On Windows data can be loaded from DLLs, but it requires indirection through a pointer in the import address table. As a result, the compiler must know if an object that is being accessed is being imported from a DLL by using the __declspec(dllimport) type specifier.
This is unfortunate because it means a that a header for a Windows library designed to be used as either a static library or a dynamic library needs to know which version of the library the program is linking to. This requirement is not applicable to functions, which are transparently emulated for DLLs with a stub function calling the real function, whose address is stored in the import address table.
On Linux the dynamic linker (ld.so) copies the values of all linked data objects from a shared object into a private mapped region for each process. This doesn't require indirection because the address of the private mapped region is local to the module, so its address is decided when the program is linked (and in the case of position independent executables, relative addressing is used).
Why doesn't Windows do the same? Is there a situation where a DLL might be loaded more than once, and thus require multiple copies of linked data? Even if that was the case, it wouldn't be applicable to read only data.
It seems that the MSVCRT handles this issue by defining the _DLL macro when targeting the dynamic C runtime library (with the /MD or /MDd flag), then using that in all standard headers to conditionally declare all exported symbols with __declspec(dllimport). I suppose you could reuse this macro if you only supported statically linking when using the static C runtime and dynamically linking when using the dynamic C runtime.
References:
LNK4217 - Russ Keldorph's WebLog (emphasis mine)
__declspec(dllimport) can be used on both code and data, and its semantics are subtly different between the two. When applied to a routine call, it is purely a performance optimization. For data, it is required for correctness.
[...]
Importing data
If you export a data item from a DLL, you must declare it with __declspec(dllimport) in the code that accesses it. In this case, instead of generating a direct load from memory, the compiler generates a load through a pointer, resulting in one additional indirection. Unlike calls, where the linker will fix up the code correctly whether the routine was declared __declspec(dllimport) or not, accessing imported data requires __declspec(dllimport). If omitted, the code will wind up accessing the IAT entry instead of the data in the DLL, probably resulting in unexpected behavior.
Importing into an Application Using __declspec(dllimport)
Using __declspec(dllimport) is optional on function declarations, but the compiler produces more efficient code if you use this keyword. However, you must use `__declspec(dllimport) for the importing executable to access the DLL's public data symbols and objects.
Importing Data Using __declspec(dllimport)
When you mark the data as __declspec(dllimport), the compiler automatically generates the indirection code for you.
Importing Using DEF Files (interesting historical notes about accessing the IAT directly)
How do I share data in my DLL with an application or with other DLLs?
By default, each process using a DLL has its own instance of all the DLLs global and static variables.
Linker Tools Warning LNK4217
What happens when you get dllimport wrong? (seems to be unaware of data semantics)
How do I export data from a DLL?
CRT Library Features (documents the _DLL macro)
Linux and Windows use different strategies for accessing data stored in dynamic libraries.
On Linux, an undefined reference to an object is resolved to a library at link time. The linker finds the size of the object and reserves space for it in the .bss or the .rdata segment of the executable. When executed, the dynamic linker (ld.so) resolves the symbol to a dynamic library (again), and copies the object from the dynamic library to the process's memory.
On Windows, an undefined reference to an object is resolved to an import library at link time, and no space is reserved for it. When the module is executed, the dynamic linker resolves the symbol to a dynamic library, and creates a copy on write memory map in the process, backed by a shared data segment in the dynamic library.
The advantage of a copy on write memory map is that if the linked data is unchanged, then it can be shared with other processes. In practice this is a trifling benefit which greatly increases complexity, both for the toolchain and programs using dynamic libraries. For objects which are actually written this is always less efficient.
I suspect, although I have no evidence, that this decision was made for a particular and now outdated use case. Perhaps it was common practice to use large (for the time) read only objects in dynamic libraries on 16-bit Windows (in official Microsoft programs or otherwise). Either way, I doubt anyone at Microsoft has the expertise and time to change it now.
In order to investigate the issue I created a program which writes to an object from a dynamic library. It writes one byte per page (4096 bytes) in the object, then writes the entire object, then retries the initial one byte per page write. If the object is reserved for the process before main is called, the first and third loops should take approximately the same time, and the second loop should take longer than both. If the object is a copy on write map to a dynamic library, the first loop should take at least as long as the second, and the third should take less time than both.
The results are consistent with my hypothesis, and analyzing the disassembly confirms that Linux accesses the dynamic library data at a link time address, relative to the program counter. Surprisingly, Windows not only indirectly accesses the data, the pointer to the data and its length are reloaded from the import address table every loop iteration, with optimizations enabled. This was tested with Visual Studio 2010 on Windows XP, so maybe things have changed, although I wouldn't think that it has.
Here are the results for Linux:
$ dd bs=1M count=16 if=/dev/urandom of=libdat.dat
$ xxd -i libdat.dat libdat.c
$ gcc -O3 -g -shared -fPIC libdat.c -o libdat.so
$ gcc -O3 -g -no-pie -L. -ldat dat.c -o dat
$ LD_LIBRARY_PATH=. ./dat
local = 0x1601060
libdat_dat = 0x601040
libdat_dat_len = 0x601020
dirty= 461us write= 12184us retry= 456us
$ nm dat
[...]
0000000000601040 B libdat_dat
0000000000601020 B libdat_dat_len
0000000001601060 B local
[...]
$ objdump -d -j.text dat
[...]
400693: 8b 35 87 09 20 00 mov 0x200987(%rip),%esi # 601020 <libdat_dat_len>
[...]
4006a3: 31 c0 xor %eax,%eax # zero loop counter
4006a5: 48 8d 15 94 09 20 00 lea 0x200994(%rip),%rdx # 601040 <libdat_dat>
4006ac: 0f 1f 40 00 nopl 0x0(%rax) # align loop for efficiency
4006b0: 89 c1 mov %eax,%ecx # store data offset in ecx
4006b2: 05 00 10 00 00 add $0x1000,%eax # add PAGESIZE to data offset
4006b7: c6 04 0a 00 movb $0x0,(%rdx,%rcx,1) # write a zero byte to data
4006bb: 39 f0 cmp %esi,%eax # test loop condition
4006bd: 72 f1 jb 4006b0 <main+0x30> # continue loop if data is left
[...]
Here are the results for Windows:
$ cl /Ox /Zi /LD libdat.c /link /EXPORT:libdat_dat /EXPORT:libdat_dat_len
[...]
$ cl /Ox /Zi dat.c libdat.lib
[...]
$ dat.exe # note low resolution timer means retry is too small to measure
local = 0041EEA0
libdat_dat = 1000E000
libdat_dat_len = 1100E000
dirty= 20312us write= 3125us retry= 0us
$ dumpbin /symbols dat.exe
[...]
9000 .data
1000 .idata
5000 .rdata
1000 .reloc
17000 .text
[...]
$ dumpbin /disasm dat.exe
[...]
004010BA: 33 C0 xor eax,eax # zero loop counter
[...]
004010C0: 8B 15 8C 63 42 00 mov edx,dword ptr [__imp__libdat_dat] # store data pointer in edx
004010C6: C6 04 02 00 mov byte ptr [edx+eax],0 # write a zero byte to data
004010CA: 8B 0D 88 63 42 00 mov ecx,dword ptr [__imp__libdat_dat_len] # store data length in ecx
004010D0: 05 00 10 00 00 add eax,1000h # add PAGESIZE to data offset
004010D5: 3B 01 cmp eax,dword ptr [ecx] # test loop condition
004010D7: 72 E7 jb 004010C0 # continue loop if data is left
[...]
Here is the source code used for both tests:
#include <stdio.h>
#ifdef _WIN32
#include <windows.h>
typedef FILETIME time_l;
time_l time_get(void) {
FILETIME ret; GetSystemTimeAsFileTime(&ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->dwLowDateTime/100-c1->dwLowDateTime/100+c2->dwHighDateTime*100000-c1->dwHighDateTime*100000;
}
#else
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
typedef struct timespec time_l;
time_l time_get(void) {
time_l ret; clock_gettime(CLOCK_MONOTONIC, &ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->tv_nsec/1000-c1->tv_nsec/1000+c2->tv_sec*1000000-c1->tv_sec*1000000;
}
#endif
#ifndef PAGESIZE
#define PAGESIZE 4096
#endif
#ifdef _WIN32
#define DLLIMPORT __declspec(dllimport)
#else
#define DLLIMPORT
#endif
extern DLLIMPORT unsigned char volatile libdat_dat[];
extern DLLIMPORT unsigned int libdat_dat_len;
unsigned int local[4096];
int main(void) {
unsigned int i;
time_l t1, t2, t3, t4;
long long int d1, d2, d3;
t1 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t2 = time_get();
for(i=0; i < libdat_dat_len; i++) {
libdat_dat[i] = 0xFF;
}
t3 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t4 = time_get();
d1 = time_diff(&t1, &t2);
d2 = time_diff(&t2, &t3);
d3 = time_diff(&t3, &t4);
printf("%-15s= %18p\n%-15s= %18p\n%-15s= %18p\n", "local", local, "libdat_dat", libdat_dat, "libdat_dat_len", &libdat_dat_len);
printf("dirty=%9lldus write=%9lldus retry=%9lldus\n", d1, d2, d3);
return 0;
}
I sincerely hope someone else benefits from my research. Thanks for reading!
I did an experiment to see what kind of assembly language would be generate if I try to get the same function to compile in there twice. I did the following:
I created two simple test files and their corresponding headers. Let's call them a.c/a.h, and b.c/b.h. Here are the contents of those files:
a.h:
#ifndef __A_H__
#define __A_H__
int a( void );
#endif
b.h:
#ifndef __B_H__
#define __B_H__
int b( void );
#endif
a.c:
#include "a.h"
int a( void )
{
return 1;
}
b.c:
#include "b.h"
#include "a.h"
int b( void )
{
return 1 + a();
}
I then created a static archive for a:
gcc -c a.c -o a.o
ar -rsc a.a a.o
and the same for b, including the static archive for a this time:
gcc -c b.c -o b.o
ar -rsc b.a a.a b.o
At this point, I disassemble the static archive for b to verify that it has assembly code for both functions a() and b(). It does.
Now, I define one last file:
main.c:
#include <stdio.h>
#include "a.h"
#include "b.h"
int main( void )
{
printf( "%d %d\n", a(), b() );
return 0;
}
and I compile it thusly:
gcc main.c a.a b.a -o main
This works fine. When I disassemble it, I see the following definitions for a and b in the code:
140 0000000000400561 <a>:
141 400561: 55 push %rbp
142 400562: 48 89 e5 mov %rsp,%rbp
143 400565: b8 01 00 00 00 mov $0x1,%eax
144 40056a: 5d pop %rbp
145 40056b: c3 retq
146
147 000000000040056c <b>:
148 40056c: 55 push %rbp
149 40056d: 48 89 e5 mov %rsp,%rbp
150 400570: e8 ec ff ff ff callq 400561 <a>
151 400575: 83 c0 01 add $0x1,%eax
152 400578: 5d pop %rbp
153 400579: c3 retq
154 40057a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
As you can see, the code has clearly defined b as calling a rather than inlining it, however, there is only one definition of a in the code, no duplicates.
It seems that gcc has either:
Detected the duplicate object code and removed the duplicates
--or--
the b archive was used first, and it included the reference to int a(), so the a archive was ignored.
My question is: is this behavior circumstantial to my test or is it standard, and can I expect the same behavior from other compilers? Obviously duplicate code is one problem, however there could be duplicate global references as well. Is it safe/good practice to build a large application that has multiple dependency paths to the same static archive? Are there less obvious situations than just duplicate symbol names where issues can arise when doing this?
Asking this because I've been playing with this idea for a project I'm on, and want to make the right choices.
My question is: is this behavior circumstantial to my test or is it standard, and can I expect the same behavior from other compilers?
As far as the compiler itself is concerned, there is no issue: you have one definition for each function among your sources.
As far as ar is concerned, you also have no issue: neither of the archives you built contains any duplicate symbols.
Different linkers may exhibit different behaviors, however. It is conceivable that some would reject linking archives that contain duplicate external symbols. Typical UNIX linkers will handle the situation you present, but they may vary in some details, such as whether a duplicate copy of function a() is included in the binary.
Obviously duplicate code is one problem, however there could be duplicate global references as well. Is it safe/good practice to build a large application that has multiple dependency paths to the same static archive?
"Multiple paths to the same static archive" does not seem to be a good characterization of the situation you present. In neither case do you provide the same archive more than once. Rather, in the b case you provide different archives with duplicate members. Linkers generally do not have problems with specifying the same archive multiple times in the same link command. Under some circumstances it may even be necessary to do so; it should not present a problem.
Providing distinct archives with duplicate members probably will not present a problem, except possibly for bloating your code with duplicate function implementations. This is a bit less certain, but I doubt it would present a problem in practice.
Whether that's good practice is a matter of opinion, but I'm inclined to think not. It's also not clear to me what gain you seen in such an approach. On the other hand, I won't be sharpening any stakes or preparing any kindling if you decide to go ahead anyway.
For the bounty: How can this behavior can be disabled on a case-by-case basis without disabling or lowering the optimization level?
The following conditional expression was compiled on MinGW GCC 3.4.5, where a is a of type signed long, and m is of type unsigned long.
if (!a && m > 0x002 && m < 0x111)
The CFLAGS used were -g -O2. Here is the corresponding assembly GCC output (dumped with objdump)
120: 8b 5d d0 mov ebx,DWORD PTR [ebp-0x30]
123: 85 db test ebx,ebx
125: 0f 94 c0 sete al
128: 31 d2 xor edx,edx
12a: 83 7d d4 02 cmp DWORD PTR [ebp-0x2c],0x2
12e: 0f 97 c2 seta dl
131: 85 c2 test edx,eax
133: 0f 84 1e 01 00 00 je 257 <_MyFunction+0x227>
139: 81 7d d4 10 01 00 00 cmp DWORD PTR [ebp-0x2c],0x110
140: 0f 87 11 01 00 00 ja 257 <_MyFunction+0x227>
120-131 can easily be traced as first evaluating !a, followed by the evaluation of m > 0x002. The first jump conditional does not occur until 133. By this time, two expressions have been evaluated, regardless of the outcome of the first expression: !a. If a was equal to zero, the expression can (and should) be concluded immediately, which is not done here.
How does this relate to the the C standard, which requires Boolean operators to short-circuit as soon as the outcome can be determined?
The C standard only specifies the behavior of an "abstract machine"; it does not specify the generation of assembly. As long as the observable behavior of a program matches that on the abstract machine, the implementation can use whatever physical mechanism it likes for implementing the language constructs. The relevant section in the standard (C99) is 5.1.2.3 Program execution.
It is probably a compiler optimization since comparing integral types has no side effects. You could try compiling without optimizations or using a function that has side effects instead of the comparison operator and see if it still does this.
For example, try
if (printf("a") || printf("b")) {
printf("c\n");
}
and it should print ac
As others have mentioned, this assembly output is a compiler optimization that doesn't affect program execution (as far as the compiler can tell). If you want to selectively disable this optimization, you need to tell the compiler that your variables should not be optimized across the sequence points in the code.
Sequence points are control expressions (the evaluations in if, switch, while, do and all three sections of for), logical ORs and ANDs, conditionals (?:), commas and the return statement.
To prevent compiler optimization across these points, you must declare your variable volatile. In your example, you can specify
volatile long a;
unsigned long m;
{...}
if (!a && m > 0x002 && m < 0x111) {...}
The reason that this works is that volatile is used to instruct the compiler that it can't predict the behavior of an equivalent machine with respect to the variable. Therefore, it must strictly obey the sequence points in your code.
The compiler's optimising - it gets the result into EBX, moves it to AL, part of EAX, does the second check into EDX, then branches based on the comparison of EAX and EDX. This saves a branch and leaves the code running faster, without making any difference at all in terms of side effects.
If you compile with -O0 rather than -O2, I imagine it will produce more naive assembly that more closely matches your expectations.
The code is behaving correctly (i.e., in accordance with the requirements of the language standard) either way.
It appears that you're trying to find a way to generate specific assembly code. Of two possible assembly code sequences, both of which behave the same way, you find one satisfactory and the other unsatisfactory.
The only really reliable way to guarantee the satisfactory assembly code sequence is to write the assembly code explicitly. gcc does support inline assembly.
C code specifies behavior. Assembly code specifies machine code.
But all this raises the question: why does it matter to you? (I'm not saying it shouldn't, I just don't understand why it should.)
EDIT: How exactly are a and m defined? If, as you suggest, they're related to memory-mapped devices, then they should be declared volatile -- and that might be exactly the solution to your problem. If they're just ordinary variables, then the compiler can do whatever it likes with them (as long as it doesn't affect the program's visible behavior) because you didn't ask it not to.