Scope rules in C: Nested blocks - c

I have the following nested function:
int main()
{
int a, b, c;
a = 10;
int foo()
{
int a, b, c;
//some more code here
}
// some more code here
}
Now, I need to assign the variable a that belongs to foo(), with the value of the variable a that belongs to main(). Basically, something like foo.a = main.a is what I'm looking for.
Is there any way of doing this kind of assignment? I read through scope rules here and here , but didn't find anything I could use in this situation.
I know that using a nested function is not advisable, but I'm working on preexisting code, and I don't have permission to change the structure of the code.
How do I proceed?

Keeping apart the nested function part, AFAIK, C does not provied any direct way to access the shadowed variable.
Primary Advice: Do not use this approach. Always use separate variable names for inner scopes and supply -Wshadow to gcc to detect and avoid possible shdowing.
However, just in case, you have to use the same variable names for inner and outer scope and you have to access the outer scope variable from the inner scope, your best bet is to (in this very order, inside the inner block)
declare a pointer, assign the address of the outer variable to it.
declare and define the local variable.
use both.
Note: As a general word of advice, please try not to write new code (I understand the maintainance part) in this manner. It is both hard to manage and hard to read.

Related

What would be the correct variable name?

In a implementation for a real time embedded device, I have a status register variable for each channel (let's blindly assume my embedded device have multiple channels and some work has to be done for each of them).
So here's how the status variable is currently declared:
struct channel_status status[NCHANNELS];
Matter of performance, it is better to use an intermediate global variable that is the copy of the status variable for the selected channel.
Example:
struct channel_status status_shadow;
void some_work() {
for(channel = 0; channel < NCHANNELS; channel++) {
status_shadow = status[channel];
foo(); // Function that use the status_shadow as global
bar(); // "
baz(); // "
status[channel] = status_shadow;
}
Am I not discussing the implementation neither the possibility to use a pointer instead of a variable. My question is related to the name of the intermediate variable.
I chose status_shadow because I think I am doing some kind of shadowing.
Is there a better/more accurate technical name for such intermediate variable ?
Implementation considerations:
The reason why I decided to use this intermediate variable is because it is too resource consuming to pass either the channel pointer i or the status variable to each function foo, bar, baz, ... In terms of performance avoiding stack push/pop can save some precious time in real-time applications.
You are not technically shadowing; you would have to define a variable of the same name to shadow it. Moreover, shadowing is generally frowned upon because careless use could lead to easy confusion.
What you are doing is taking the current item for your cycle, so a suited name could be current_status or cur_status. If you used it as a parameter, so the name would be only contained into the for(), it could have been current or cur_item as well.
Another idea could be temp_channel_status, implying that the value is not to be considered fixed albeit the variable is global.
I would like a name such as work_status or status_copy.
You could use status_local, or status_local_copy.

Verifying data types/structs in a parser

I'm writing a recursive descent parser, and I'm at the point where I'm unsure how to validate everything. I'm not even sure if I should be doing this at the stage of the parser. What I mean is, I could have some syntax i.e:
int x = 5
int x = 5
And that would be valid, so would the parser check if x has already been defined? If so, would I use a hashmap? And what kind of information would I need to store, like how can I handle the scope of a variable, since x could be defined in a function in a local and global scope:
int x = 5;
void main() {
int x = 2;
}
And finally, when I store to the hashmap, how can I differentiate the types? For example, I could have a variable called foo, and a struct also called foo. So when I put foo in a hashmap, it will probably cause some errors. I'm thinking I could prefix it like storing this as the hashmaps key for a struct struct_xyz where xyz is the name of the struct, and for variables int_xyz?
Thanks :)
I'm going to assume that regardless of which approach you choose, your parser will be constructing some kind of abstract syntax tree. You now have two options. Either, the parser could populate the tree with identifier nodes that store the name of the variable or function that they are referencing. This leaves the issue of scope resolution to a later pass, as advocated in many compiler textbooks.
The other option is to have the parser immediately look the identifier up in a symbol table that it builds as it goes, and store a pointer to the symbol in the abstract syntax tree node instead. This approach tends to work well if your language doesn't allow implicit forward-references to names that haven't been declared yet.
I recently implemented the latter approach in a compiler that I'm working on, and I've been very pleased with the result so far. I will briefly describe my solution below.
Symbols are stored in a structure that looks something like this:
typedef struct symbol {
char *name;
Type *type;
Scope *scope; // Points to the scope in which the symbol was defined.
} Symbol;
So what is this Scope thing? The language I'm compiling is lexically scoped, and each function definition, block, etc, introduces a new scope. Scopes form a stack where the bottom element is the global scope. Here's the structure:
typedef struct scope {
struct scope *parent;
Symbol *buckets;
size_t nbuckets;
} Scope;
The buckets and nbuckets fields are a hash map of identifiers (strings) to Symbol pointers. By following the parent pointers, one can walk the scope stack while searching for an identifier.
With the data structures in place, it's easy to write a parser that resolves names in accordance with the rules of lexical scoping.
Upon encountering a statement or declaration that introduces a new scope (such as a function declaration or a block statement), the parser pushes a new Scope onto the stack. The new scope's parent field points to the old scope.
When the parser sees an identifier, it tries to look it up in the current scope. If the lookup fails in the current scope, it continues recursively in the parent scope, etc. If no corresponding Symbol can be found, an error is raised. If the lookup is successful, the parser creates an AST node with a pointer to the symbol.
Finally, when a variable or function declaration is encountered, it is bound in the current scope.
Some languages use more than one namespace. For instance, in Erlang, functions and variables occupy different namespaces, requiring awkward syntax like fun foo:bar/1 to get at the value of a function. This is easily implemented in the model I outlined above by keeping several Scope stacks - one for each namespace.
If we define "scope" or "context" as mapping from variable names to types (and possibly some more information, such as scope depth), then its natural implementation is either hashmap or some sort of search tree. Upon reaching any variable definition, compiler should insert the name with corresponding type into this data structure. When some sort of 'end scope' operator is encountered, we must already have enough information to 'backtrack' changes in this mapping to its previous state.
For hashmap implementation, for each variable definition we can store previous mapping for this name, and restore this mapping when we have reached the 'end of scope' operator. We should keep a stack of stacks of this changes (one stack for each currently open scope), and backtrack topmost stack of changes in the end of each scope.
One drawback of this approach is that we must either complete compilation in one pass, or store mapping for each identifier in program somewhere, as we can't inspect any scope more than once, or in order other than order of appearance in the source file (or AST).
For tree-based implemetation, this can be easily achieved with so called persistent trees. We just maintain a stack of trees, one for each scope, pushing as we 'open' some scope, and poping when the scope is ended.
The 'depth of scope' is enough for choose what to do in the situation where then new variable name conflicts with one already in mapping. Just check for old depth < new depth and overwrite on success, or report error on failure.
To differentiate between function and variable names you can use separate (yet similar or same) mappings for those objects. If some context permits only function or only variable name, you already know where to look. If both are permited in some context, perform lookup in both structures, and report "ambiguity error" if name corresponds to a function and a variable at the same time.
The best way is to use a class, where you define structures like HashMap, that lets you to do controls about the type and or the existence of a variable. This class should have static methods that interface with the grammar rules written in the parser.

Multiple instances of a variable (static, non-static)

I came across this piece of C code:
main(){
static int i=0;
i++;
if(i<=5){
int i = 3;
printf(" %d",i);
main();
}
}
1. First, I expected this code to give a compilation error as there are multiple definitions of the variable i. But, it compiled and ran successfully and gave this output.
3 3 3 3 3
2. Observing the output, 3 is printed exactly 5 times, which means the loop was counted from 0 to 5 thus implying that for the if condition , the first definition (static) of i was used.
3 However, the value being printed is 3 which is the 2nd definition of i.
So the variable label i is referring to two different instances in memory. One is being used as the loop count, to do the increment, and the other is the value being printed.
The only way I can somehow explain this is:
int i = 3 (the 2nd definition) is repeated in every recursive call. That instance of i is created when the function is called, and killed when the next recursive call is made. (Because of static scoping). printf uses this instance, as it is the latest definition(?)
When entering a new level of recursion, i++ is being done. Since there is no other way to resolve this i, it uses the static "instance" of i , which is still "alive" in the code as it was defined as static.
However, I'm unable to exactly put a finger on how this works..can anyone explain what's going on here, in the code and the memory?
How is the variable binding being done by the compiler here?
The inner scope wins.
Example:
int i = 1;
void foo() {
int i = 2; // hides the global i
{
int i = 3; // hides local i
}
}
This behavior is by design. What you can do is use different naming conventions for variable scopes:
global/statics
function arguments
locals
class/struct members
Some compilers will issue a warning if you hide a variable in the same function (e.g. function argument and regular local variable). So you the max warning level on your compiler.
The compiler will always use the most local version of a variable when more than one variable of that name exists.
Outside the loop, the first i is the only one that exists, so it is the one that is checked. Then a new i is created, with value 3. At this point whenever you talk about i it will assume you mean the second one, since that's more local. When you exit the loop, the second i will go out of scope and be deleted and so if you start talking about i again it will be the first one.
The {} of the if statement creates a new block scope and when you declare i in that scope you are hiding the i in the outer scope. The new scope does not start until { and thus the if statement is referring to the i in the outer scope.
Hiding is covered in the draft C99 standard section 6.2.1 Scopes of identifiers paragraph 4 says (emphasis mine):
[...]If an identifier designates two different entities in the same name
space, the scopes might overlap. If so, the scope of one entity (the inner scope) will be a
strict subset of the scope of the other entity (the outer scope). Within the inner scope, the
identifier designates the entity declared in the inner scope; the entity declared in the outer scope is hidden (and not visible) within the inner scope.

Using C variable inside Lua alongside nested functions

This is a sort of followup to my previous question about nested registered C functions found here:
Trying to call a function in Lua with nested tables
The previous question gave me the answer to adding a nested function like this:
dog.beagle.fetch()
I also would like to have variables at that level like:
dog.beagle.name
dog.beagle.microchipID
I want this string and number to be allocated in C and accessible by Lua. So, in C code, the variables might be defined as:
int microchipIDNumber;
char dogname[500];
The C variables need to be updated by assignments in Lua and its value needs to be retrieved by Lua when it is on the right of the equal sign. I have tried the __index and __newindex metamethod concept but everything I try seems to break down when I have 2 dots in the Lua path to the variable. I know I am probably making it more complicated with the 2 dots, but it makes the organization much easier to read in the Lua code. I also need to get an event for the assignment because I need to spin up some hardware when the microchipIDNumber value changes. I assume I can do this through the __newindex while I am setting the value.
Any ideas on how you would code the metatables and methods to accomplish the nesting? Could it be because my previous function declarations are confusing Lua?
The colon operator (:) in Lua is used only for functions. Consider the following example:
meta = {}
meta["__index"] = function(n,m) print(n) print(m) return m end
object = {}
setmetatable(object,meta)
print(object.foo)
The index function will simply print the two arguments it is passed and return the second one (which we will also print, because just doing object.foo is a syntax error). The output is going to be table: 0x153e6d0 foo foo with new lines. So __index gets the object in which we're looking up the variable and it's name. Now, if we replace object.foo with object:foo we get this:
input:5: function arguments expected near ')'
This is the because : in object:foo is syntactic sugar for object.foo(object), so Lua expects that you will provide arguments for a function call. If we did provide arguments (object:foo("bar")) we get this:
table: 0x222b3b0
foo
input:5: attempt to call method 'foo' (a string value)
So our __index function still gets called, but it is not passed the argument - Lua simply attemps to call the return value. So don't use : for members.
With that out of the way, let's look at how you can sync variables between Lua and C. This is actually quite involved and there are different ways to do it. One solution would be to use a combination of __index and __newindex. If you have a beagle structure in C, I'd recommend making these C functions and pushing them into the metatable of a Lua table as C-closures with a pointer to your C struct as an upvalue. Look at this for some info on lua_pushcclosure and this on closures in Lua in general.
If you don't have a single structure you can reference, it gets a lot more complicated, since you'll have to somehow store pairs variableName-variableLocation on the C side and know what type each is. You could maintain such a list in the actual Lua table, so dog.beagle would be a map of variable name to one or two something's. There a couple of options for this 'something'. First - one light user data (ie - a C pointer), but then you'll have the issue of figuring out what that is pointing to, so that you know what Lua type to push in for __index and what to pop out for __newindex . The other option is to push two functions/closures. You can make a C function for each type you'll have to handle (number, string, table, etc) and push the appropriate one for each variable, or make a uber-closure that takes a parameter what type it's being given and then just vary the up-values you push it with. In this case the __index and __newindex functions will simply lookup the appropriate function for a given variable name and call it, so it would be probably easiest to implement it in Lua.
In the case of two functions your dog.beagle might look something like this (not actual Lua syntax):
dog.beagle = {
__metatable = {
__index = function(table,key)
local getFunc = rawget(table,key).get
return getFunc(table,key)
end
__newindex = function(table,key,value)
local setFunc = rawget(table,key).set
setFunc(table,key,value)
end
}
"color" = {
"set" = *C function for setting color or closure with an upvalue to tell it's given a color*,
"get" = *C function for getting color or closure with an upvalue to tell it to return a color*
}
}
Notes about the above: 1.Don't set an object's __metatable field directly - it's used to hide the real metatable. Use setmetatable(object,metatable). 2. Notice the usage of rawget. We need it because otherwise trying to get a field of the object from within __index would be an infinite recursion. 3. You'll have to do a bit more error checking in the event rawget(table,key) returns nil, or if what it returns does not have get/set members.

What is the issue with this C function that contains a function?

My professor showed us this code:
timerX(int x){
int times(int y){
return x * y;
}
return times;
}
How does this work in C(using GCC compiler)? He said that as soon as the function disappears the inside function disappears? I appreciate any tips or advice.
It's called a nested function, a GNU extension. Basically
the inner function can acess the local variables of the outer function (the ones declared prior to its apparition)
the inner function can only be called from outside via function poinyers but not after the containing function has terminated if the inner function accesses objects from its parent
In your example, calling that function pointer from outside will probably be illegal.
If you try to call the nested function through its address after the
containing function has exited, all hell will break loose.
I'm pretty sure it works just like any other function, except that it is only visible to the enclosing function.
In other words, it's just related to the visibility or accessibility of the function, and nothing else.

Resources