I want to know how a static variable or function is protected to be used only for the file it is defined in. I know that such variables and functions are declared in data section (heap area to be precise), but is it tagged with the file name ? Suppose I make a fool of the compiler by assigning such a static function (defined in foo.c) to a global function pointer, and call that function pointer in some other file (bar.c). Obviously my code wont give any compilation warning, but incidentally, it gives segmentation fault. Obviously, it is a protection fault, but I am interested in knowing how it is implemented inside the system.
Thanks. MS
The linker takes care of restricting the scope of mapping the function name to the function.
There is no protection for static functions called by function pointer - it's not that uncommon an idiom. For example, the recommended way of implementing GObject methods is to expose a pointer to a static function (see the virtual public methods section in this GObject how-to)
It is 'protected' simply by not having its symbol/location made known to the linker. So you cannot write code in another module that explicitly references the static object by its symbol name, because the linker has no such symbol. There is no run-time protection.
If you pass an address to a static object to some other module at runtime, then you will then be able to access it through such a pointer. That is not "making a fool of the compiler" (or linker in fact), such action may be entirely legitimate.
The fact that you got a seg-fault is probably for an entirely different reason (an invalid pointer for example). The compiler may choose to in-line the code in which case a pointer to it would not be be possible, but if you explicitly take the address of an object, the compiler should instantiate it, so this seems unlikely.
The purpose of static is not to 'protect' the variable/function but to protect the namespace and protect the rest of your program from having its behavior messed up by symbols with conflicting names. It also allows a good bit more optimization in that the compiler knows it doesn't have to facilitate access to the symbol name by outside modules.
you "may" get a problem if foo.c and bar.c are compiled into different dynamic loaded libraries.
Related
How is scope of a variable is implemented by compilers?
I mean, when we say static variable, the scope is limited to the block or functions that defined in the same file where the static variable is defined?
How is this achieved in machine level or at memory level?
How actually is this restriction achieved?
How is this scoping resolved at program run time?
It is not achieved at all at the machine level. The compiler checks for scopes before machine code is actually generated. The rules of C are implemented by the compiler, not by the machine. The compiler must check those rules, the machine does not and cannot.
A very simplistic explanation of how the compiler checks this:
Whenever a scope is introduced, the compiler gives it a name and puts it in a structure (a tree) that makes it easy to determine the position of that scope in relation to other scopes, and it is marked as being the current scope. When a variable is declared, its assigned to the current scope. When accessing a variable, it is looked for in the current scope. If not found, the tree is looked up to find the scope above the current one. This continues until we reach the topmost scope. If the variable is still not found, then we have a scope violation.
inside compilers, its implementation defined. For example if I were writing a compiler, I would use a tree to define 'scope' and it would definitely be a symbol table inside a binary tree.
Some would use an arbitrary depth Hash table. Its all implementation defined.
I'm not 100% sure I understand what you are asking, but if you mean "how are static variables and functions stored in the final program", that is implementation-defined.
That said, a common way of storing such variables and functions is in the same place as any other global symbols (and some non-global ones) -- the difference is that these are not "exported", and thus not visible in any outside code trying to link to our software.
In other words, a program which has the following in it:
int var;
static int svar;
int func() { static int func_static; ... }
static int sfunc() { ... }
... might have the following layout in memory (let's say our data starts at 0xF000 and functions at 0xFF00):
0xF000: var
0xF004: svar
0xF008: func.func_static
...
0xFF00: func's data
0xFF40: sfunc's data /* assuming we needed 0x40 bytes for `func`! */
The list of exports, however, would only contain the non-static symbols, aka the exported ones:
var v 0xF000
func f 0xFF00
Again -- note how, while the static data is still written into the files (it has to be stored somewhere!), it is not exported; in layman's terms, our program does not tell anyone that it contains svar, sfunc and similar.
In Unices, you can list the symbols that a library or a program exports with the nm tool: http://unixhelp.ed.ac.uk/CGI/man-cgi?nm ; there do exist similar tools for Windows (GnuWin32 might have something similar).
In practice, executable code is often stored separately from the data (so that it can be protected from writes, for example), and it both may get reordered to minimize memory use and cache misses, but the idea remains the same.
Of course, optimizations can be applied -- for example, a static function could be inlined in its every invokation, meaning that no code is generated for the function itself at all, and thus it does not exist on its own anywhere.
Just wanted to verify that in VC++, unused member functions which are never called are by default considered as inlined functions by the compiler? If so why it is so, why not completely discard (since it will never be called) this function instead of in-lining it?
What is the advantage?
Update
The question is why even inline it when it will never be called? Why not simply discard it forever, just like some unused variables are discarded.
Member functions are considered inline without use of the inline keyword if they are defined in the body of the class definition. Whether they are called or not has nothing to do with it.
Unused member functions can't generally be discarded because their names have external linkage -- that is to say, some other translation unit or executable might call them, that hasn't even been written at the time this translation unit is compiled or this executable is linked.
Once you get to link-time, if the implementation somehow knows that this cannot happen then it could discard the code for the function. For example because the OS has no means to look up symbols in an executable, or because you've told the linker to strip them out using some implementation-defined option.
Relating this to VC++ in particular: on Windows you can look up symbols in executables if they're dllexport. So those functions won't generally be discarded even at link time, and other unused functions can't be discarded at compile time just because this TU doesn't use it. For most classes defined in the usual way, with a header file that declares the member functions and a source file that defines them, the functions are unused in that source file. So if the compiler discarded them because they were unused in that TU, nothing would ever work.
I think (I'm not sure) that whether the function is inline or not is relevant to whether it can be discarded, but might not mean that it can be entirely discarded. It's true that if it's inline, and someone calls it, then that someone must have the definition of the function in their TU. So in some sense the function is "not needed". However, any static local variables must be shared no matter what TU it's called from, and the address of the function itself must be the same no matter what TU it's taken in. So there may still have to be "something" there even if it's not the full code for the function.
But as I said -- even if inline functions can be discarded when unused, not all unused functions are inline.
Inline it where? It's never called, so it's impossible to inline it into any call site.
The Standard mandates whether a function is or is not considered inline. Whether or not it is called is irrelevant.
I understand that a static function in C allows that particular function to only be call within the confines of that file. What I am interested in is how this occurs. Is it being placed into a specific part of memory or is the compiler applying a specific operation to that function. Can this same process be applied to a function call in assembly?
Declaring a function static doesn't really prevent it from being called from other translation units.
What static does is it prevents the function from being referred (linked) from other translation units by name. That will eliminate the possibility of direct calls to that function, i.e calls "by name". To achieve that, the compiler simply excludes the function name from the table of external names exported from the translation unit. Other than that, there's absolutely nothing special about static functions.
You still can call that function from other translation units by other means. For example, if you somehow obtained a pointer to static function in other translation unit, you can call it through that pointer.
It doesn't make it into the object's name table which prevents it from being linked into other stuff.
Functions and other names are exported as symbols in the object file. The linker uses these symbols to resolve all sorts of dangling references at link time (e.g. a call to a function defined in another file). When you declare it static, simply it won't be exported as a symbol. Therefore it won't be picked up by any other file. You could still call it from another file if you had a function pointer to it.
It's in fact the opposite. When a function is not static, its name is written somewhere in the object file, which the linker can then use to link other object files using this function, to the address of that function.
When the function is declared static, the compiler simply doesn't put the name there.
So I'm working on a "quick and dirty" profiler for firmware- I just need to know how long some functions take. Merely printing the time it takes every time will skew the results, as logging is expensive- so I am saving a bunch of results to an array and dumping that after some time.
When working in one compilation unit (one source file), I just had a bunch of static arrays storing the results. Now I need to do this across several files. I could "copy paste" the code, but that would be just ugly (Bear with me). If I put timing code in a seperate compilation unit, make static variables, and provide accessor functions in the header file, I will be incurring the overhead of function calls every time i want to access those static variables.
Is it possible to access static variables of a compilation unit directly?
I've always tried to encapsulate data, and not use global variables, but this situation calls for it simply due to speed concerns.
I hope this makes sense! Thank you!
EDIT: Alright, so it appears what I'm asking is impossible- do any of you see alternatives that essentially allow me to directly access data of another compilation unit?
EDIT2: Thank you for the answers Pablo and Jonathan. I ended up accepting Pablo's because I didn't have clear place to get the pointer to the static data (as per Jonathan) in my situation. Thanks again!
No, it's not possible to access static variables of a compilation unit from another one. static keyword precisely prevents that from happening.
If you need to access globals of one compilation unit from another, you can do:
file1.c:
int var_from_file1 = 10;
file2.c:
extern int var_from_file1;
// you can access var_from_file1 here
If you can remove the static keyword from your declarations, you should be fine. I understand that changing existing source code is not always an option (I.E. dealing with existing legacy compiled code).
To get at the static variables in a compilation unit C1 from another unit C2, some function in C1 must make pointers to the variables available to C2, or some non-static variable must contain a pointer to the static variables.
So, you could package the 'static variables' into a single structure, and then write a function that returns a pointer to that structure; you can call that function to gain access to the static variables.
Similar rules apply to static functions; if some function (or non-static variable) in the file makes the pointers to the functions available, then the static functions can be called indirectly from outside the file.
If access via pointers doesn't count as directly, then you are snookered; static hides and you can't unhide except by removing the keyword static from the variables when the module is compiled - maybe via the C preprocessor. Beware name clashes.
I wonder about the use of the static keyword as scope limiting for variables in a file, in C.
The standard way to build a C program as I see it is to:
have a bunch of c files defining functions and variables, possibly scope limited with static.
have a bunch of h files declaring the functions and possibly variables of the corresponding c file, for other c files to use. Private functions and variables are not published in the h file.
every c file is compiled separately to an o file.
all o files are linked together to an application file.
I see two reasons for declaring a gobal as static, if the variable is not published in the h file anyway:
one is for readability. Inform future readers including myself that a variable is not accessed in any other file.
the second is to prevent another c file from redeclaring the variable as extern. I suppose that the linker would dislike a variable being both extern and static. (I dislike the idea of a file redeclaring a variable owned by someone else as extern, is it ok practice?)
Any other reason?
Same goes for static functions. If the prototype is not published in the h file, other files may not use the function anyway, so why define it static at all?
I can see the same two reasons, but no more.
When you talk about informing other readers, consider the compiler itself as a reader. If a variable is declared static, that can affect the degree to which optimizations kick in.
Redefining a static variable as extern is impossible, but the compiler will (as usual) give you enough rope to hang yourself.
If I write static int foo; in one file and int foo; in another, they are considered different variables, despite having the same name and type - the compiler will not complain but you will probably get very confused later trying to read and/or debug the code. (If I write extern int foo; in the second case, that will fail to link unless I declare a non-static int foo; somewhere else.)
Global variables rarely appear in header files, but when they do they should be declared extern. If not, depending on your compiler, you risk that every source file which includes that header will declare its own copy of the variable: at best this will cause a link failure (multiply-defined symbol) and at worst several confusing cases of overshadowing.
By declaring a variable static on file level (static within function has a different meaning) you forbid other units to access it, e.g. if you try to the variable use inside another unit (declared with extern), linker won't find this symbol.
When you declare a static function the call to the function is a "near call" and in theory it performs better than a "far call". You can google for more information. This is what I found with a simple google search.
If a global variable is declared static, the compiler can sometimes make better optimizations than if it were not. Because the compiler knows that the variable cannot be accessed from other source files, it can make better deductions about what your code is doing (such as "this function does not modify this variable"), which can sometimes cause it to generate faster code. Very few compilers/linkers can make these sorts of optimizations across different translation units.
If you declare a variable foo in file a.c without making it static, and a variable foo in file b.c without making it static, both are automatically extern which means the linker may complain if you initialise both, and assign the same memory location if it doesn't complain. Expect fun debugging your code.
If you write a function foo () in file a.c without making it static, and a function foo () in file b.c without making it static, the linker may complain, but if it doesn't, all calls to foo () will call the same function. Expect fun debugging your code.
My favorite usage of static is being able to store methods that I wont have to Inject or create an object to use, the way I see it is, Private Static Methods are always useful, where public static you have to put some more time in thinking of what it is your doing to avoid what crazyscot defined as, getting your self too much rope and accidentally hanging ones self!
I like to keep a folder for Helper classes for most of my projects that mainly consist of static methods to do things quickly and efficiently on the fly, no objects needed!