Variable and executable in a shell interpreter - c

Do you know, how make the difference between variable and executable in a shell interpretor? Because i don't know how i can do that in my lexer.
If anyone have an idea ^^
Thanks,
Have a nice day
Mathieu

In a normal Posix-style shell, the first "word" in a statement which is not a variable assignment is the command to execute. Variable assignments have the form name=value where there cannot be any whitespace around the = and the name is a valid variable name.
Other than that, and in arithmetic evaluation context (which is not required for basic shells), any use of a variable must be preceded by a $.
Identifying assignments is contextual, but it is easy to do since the = is mandatory. In a flex-style lexer you could enable and disable assignment recognition with appropriate start conditions, for example.
Without knowing anything more about your strategy for lexical analysis, it's hard to provide a more detailed answer.
If you care about compatibility with Posix shell syntax, the description can be found here.

Related

Is there a standard for mentioning variables within comments?

I understand that, in principle, modern programming languages are intended to be used in a manner where the code written is self-documenting.
However, I was taught that on occasion it is necessary to explicitly write a brief precondition, postcondition statement for a function to assert generality. If I had a need to mention a variable by name in the comment is there a standard for denoting that it's a variable?
Please use doxygen for C, it is the de-facto-standard and worth the effort.
https://www.doxygen.nl/manual/commands.html
\pre { description of the precondition }
Starts a paragraph where the
precondition of an entity can be described. The paragraph will be
indented. The text of the paragraph has no special internal structure.
All visual enhancement commands may be used inside the paragraph.
Multiple adjacent \pre commands will be joined into a single
paragraph. Each precondition will start on a new line. Alternatively,
one \pre command may mention several preconditions. The \pre command
ends when a blank line or some other sectioning command is
encountered.

How to process macros in LEX?

How do I implement #define in yacc/bison?
For Example:
#define f(x) x*x
If anywhere f(x) appears in any function then it is replaced by the right side of the
macro substituting for the argument ‘x’.
For example, f(3) would be replaced with 3*3. The macro can call another macro too.
It's not usually possible to do macro expansion inside a parser, at least not C-style macros, because C-style macro expansion doesn't respect syntax. For example
#define IF if(
#define THEN )
is legal (although very bad style IMHO). But for that to be handled inside the grammar, it would be necessary to allow a macro identifier to appear anywhere in the input, not just where an identifier might be expected. The necessary modifications to the grammar are going to make it much less readable and are very likely to introduce parser action conflicts. [Note 1]
Alternatively, you could do the macro expansion in the lexical analyzer. The lexical analyzer is not a parser, but parsing a C-style macro invocation doesn't require much sophistication, and if macro parameters were not allowed, it would be even simpler. This is how Flex handles macro replacement in its regular expressions. ({identifier}, for example. [Note 2] Since Flex macros are just raw character sequences, not token lists as with C-style macros, they can be handled by pushing the replacement text back into the input stream. (F)lex provides the unput special action for this purpose. unput pushes one character back into the input stream, so if you want to push an entire macro replacement, you have to unput it one character at a time, back to front so that the last character unput is the first one to be read afterwards.
That's workable but ugly. And it's not really scalable to even the small feature list provided by the C preprocessor. And it violates the fundamental principle of software design, which is that each component does just one thing (so that it can do it well).
So that leaves the most common approach, which is to add a separate macro processor component, so that instead of dividing the parse into lexical scan/syntax analysis, the parse becomes lexical scan/macro expansion/syntax analysis. [Note 3]
A C-style macro processor which works between the lexical analyser and the syntactic analyser could itself be written in Bison. As I mentioned above, the parsing requirements are generally minimal, but there is still parsing to be done and Bison is presumably already part of the project. Although I don't know of any macro processor (other than proof-of-concept programs I've written myself) which do this, I think it's a very flexible solution. In particular, the Bison syntactic analysis phase could be implemented with a push-parser, which avoids the need to produce the entire macro-expanded token stream in order to make it available to a traditional pull-parser.
That's not the only way to design macros, though. Indeed, it has a lot of shortcomings, because the macro expansions are not hygienic, respecting neither syntax nor scope. Probably anyone who has used C macros has at one time or other been bitten by these problems; the simplest manifestation is defining a macro like:
#define NEXT(a) a + 1
and then writing
int x = NEXT(a) * 3;
which is not going to produce the expected result (unless what is expected is a violation of the syntactic form of the last statement). Also, any macro expansion which needs to use a local variable will sooner or later produce an incorrect expansion because of unexpected name collision. Hygienic macro expansion seeks to solve these issues by viewing macro expansion as an operation on syntax trees, not token streams, making the parsing paradigm lexical scan/syntax analysis/macro expansion (of the parse tree). For that operation, the appropriate tool might well be some kind of tree parser.
Notes
Also, you'd want to remove the token from the parse tree Yacc/bison does have a poorly-documented feature, YYBACKUP, which might possibly help be able to accomplish this. I don't know if that's one of its intended use cases; indeed, it is not clear to me what its intended use cases are.
The (f)lex documentation calls these definitions, but they really are macros, and they suffer from all the usual problems macros bring with them, such as mysterious interactions with surrounding syntax.
Another possibility is macro expansion/lexical scan/syntax analysis, which could be implemented using a macro processor like M4. But that completely divorces the macros from the rest of the language.
yacc and lex generate c source at the end. So you can use macros inside the parser and lexer actions.
The actual #define preprocessor directives can go in the first section of the lexer and parser file
%{
// Somewhere here
#define f(x) x*x
%}
These sections will be copied verbatim to the generated c source.

When does macro substitution happen in C

I was reading the book "Compilers: Principles, Techniques, and Tools (2nd Edition)" by Alfred V. Aho. There is an example in this book (example 1.7) which asks to analyze the scope of x in the following macro definition in C:
#define a (x+1)
From this example,
We cannot resolve x statically, that is, in terms of the program text.
In fact, in order to interpret x, we must use the usual dynamic-scope
rule. We examine all the function calls that are currently active, and
we take the most recently called function that has a declaration of x.
It is to this declaration that the use of x refers.
I've become confused reading this - as far as I know, macro substitution happens in the preprocessing stage, before compilation starts. But if I get it right, the book says it happens when the program is getting executed. Can anyone please clarify this?
The macro itself has no notion of scope, at least not in the same sense as the C language has. Wherever the symbol a appears in the source after the #define (and before a possible #undef) it is replaced by (x + 1).
But the text talks about the scope of x, the symbol in the macro substitution. That is interpreted by the usual C rules. If there is no symbol x in the scope where a was substituted, this is a compilation error.
The macro is not self-contained. It uses a symbol external to the macro, some kind of global variable if you will, but one whose meaning will change according to the place in the source text where the macro is invoked. I think what the quoted text wants to say is that we cannot know what macro a does unless we know where it is evoked.
I've become confused reading this - as far as I know, macro substitution happens in preprocessing stage, before compilation starts.
Yes, this is how a compiler works.
But if I get it right, the book says it happens when the program is getting executed. Can anyone please clarify this?
Speaking without referring to the book, there are other forms of program analysis besides translating source code to object code (a.k.a. compilation). A C compiler replaces macros before compiling, thus losing information about what was originally a macro, because that information is not significant to the rest of the translation process. The question of the scope of x within the macro never comes up, so the compiler may ignore the issue.
Debuggers often implement tighter integration with source code, though. One could conceive of a debugger that points at subexpressions while stepping through the program (I have seen this feature in an embedded toolchain), and furthermore points inside macros which generate expressions (this I have never seen, but it's conceivable). Or, some debuggers allow you to point at any identifier and see its value. Pointing at the macro definition would then require resolving the identifiers used in the macro, as Aho et al discuss there.
It's difficult to be sure without seeing more context from the book, but I think that passage is at least unclear, and probably incorrect. It's basically correct about how macro definitions work, but not about how the name x is resolved.
#define a (x+1)
C macros are expanded early in the compilation process, in translation phase 4 of 8, as specified in N1570 5.1.1.2. Variable names aren't resolved until phase 7).
So the name x will be meaningfully visible to the compiler, not at the point where the macro is defined, but at the point in the source code where the macro a is used. Two different uses of the a macro could refer to two different declarations of variables named x.
We cannot resolve x statically, that is, in terms of the program text.
We cannot resolve it at the point of the macro definition.
In fact, in order to interpret x, we must use the usual dynamic-scope
rule. We examine all the function calls that are currently active, and
we take the most recently called function that has a declaration of x.
It is to this declaration that the use of x refers.
This is not correct for C. When the compiler sees a reference to x, it must determine what declaration it refers to (or issue a diagnostic if there is no such declaration). That determination does not depend on currently active function calls, something that can only be determined at run time. C is statically scoped, meaning that the appropriate declaration of x can be determined entirely by examining the program text.
At compile time, the compiler will examine symbol table entries for the current block, then for the enclosing block, then for the current function (x might be the name of a parameter), then for file scope.
There are languages that uses dynamic scoping, where the declaration a name refers to depends on the current run-time call stack. C is not one of them.
Here's an example of dynamic scoping in Perl (note that this is considered poor style):
#!/usr/bin/perl
use strict;
use warnings;
no strict "vars";
sub inner {
print " name=\"$name\"\n";
}
sub outer1 {
local($name) = "outer1";
print "outer1 calling inner\n";
inner();
}
sub outer2 {
local($name) = "outer2";
print "outer2 calling inner\n";
inner();
}
outer1();
outer2();
The output is:
outer1 calling inner
name="outer1"
outer2 calling inner
name="outer2"
A similar program in C would be invalid, since the declaration of name would not be statically visible in the function inner.

Bison - additional parameter to a push and pure parser

How can I pass one aditional parameter (not the token minor of type YYSTYPE) to the yypush_parse() function?
The parser is indeed reentrant, but this one aditional variable is crucial for the thread-safety of the application I need to integrate my parser in (it's a PHP extension, so we're talking about TSRM).
I cannot just get rid of that parameter because inside the action code I'm going to call functions which will generate an AST in a userland-accessible form.
I've tried to hack around YYPUSH_DECLS and it works as far as declaring the function is concerned, BUT a few thousand LOCs down comes the implementation of yypush_parse, and I can't see any way to overwrite the function signature where the implementation of yypush_parse starts.
YYPARSE_PARAM is only used when the parser is not a push one (as far as I can tell), but in my case I NEED it be push because of the things I have to do in the processing loop, after lexing and prior to adding a new token to the parsing stack.
So I am wondering if there's a %directive or something that may help.
On the other side, I really think YYPARSE_PARAM should be used as far as it's defined, no matter what type of parser it is. It's a pity it's not.
%parse-param. YYPARSE_PARAM is deprecated and shouldn't be used.

C Macro to Override Variable Assignment with Function Call

Calling all C macro gurus...
Is there any way to write a C macro that will replace something like this:
my_var = 5;
with this:
setVar(&my_var, 5);
In other words, can I write a C macro that will override assignments for a specific variable (in the above example, my_var) and instead pass it to a function whose job it is to set that variable? If possible, I'd like to be able to hook into assignments of a specific variable.
EDIT: After thinking about this some more, I'm not sure it could be done. Even if you can come up with a macro to do it, setVar wouldn't necessarily know the type of the variable its setting, so what would be the type of its second argument?
EDIT: The reason I'd like to hook assignments of specific variables is for use in a primitive debugger for some specialized embedded C code. It would be nice to be able to have a "watch list", essentially like you have in an IDE. My first instinct was to try to hook variable assignments with a C macro so you could just drop the macro into your code and have that variable "watched", but then again I've never really written a debugger before so maybe I'm going about that all wrong.
Not with the standard preprocessor. It cannot change the parsing of the file, only replace proper names with a piece of code (and "=" isn't valid in a name).
If you're feeling adventurous, you can try to replace the executable "cpp" with a small script which pre-processes the source code. But that might wreck havoc with the debugging information (file name and, if you're replacing one line of code with several, with line number information, too). The script would call "sed"`:
sed -e 's/my_var\s*=\s*([^;]+);/MY_VAR(my_var, $1);/' file.c > file_tmp.c
But your best bet is probably to put this into a script and simply run it on all your sources. This will change the code and you'll see what is happening in your debugger.
#define setVar(_left_, _right_) *(_left_) = _right_

Resources