I am currently working on a POSIX shell implementation in C that should work as described by SCL.
For now I have created the following lexer:
struct lexer
{
char cur_char;
long cur_pos;
FILE* stream;
};
struct lexer *init_lexer(FILE* stream);
struct token *get_next_token(struct lexer *lexer);
enum token_type get_next_token_type(struct lexer *lexer);
int is_empty(struct lexer *lexer);
void free_lexer(struct lexer *lexer);
I am using this lexer to creates AST (I am creating my AST recursively) and I have a parse function for each grammar rule (Only 3 for now: SIMPLE_COMMAND, COMMAND_LIST IF_COMMAND).
I am using this function to decide what type of node should I generate:
struct ast_node *parse(struct lexer *lexer)
{
if (lexer == NULL)
return NULL;
enum token_type type = get_next_token_type(lexer);
struct ast_node *base_node;
switch (type)
{
case WORD:
base_node = (struct ast_node *)parse_command_list(lexer);
break;
case NEWLINE:
return NULL;
case IF:
base_node = (struct ast_node *)parse_if_command(lexer);
break;
default:
base_node = NULL;
break;
}
return base_node;
}
I have some problem with that function the first is that:
if I get the first token from the lexer and then I parse the command the first will not be available in the parsing function (I added get_next_token_type that get the type of the next token without advancing in the lexer to fix this for now).
I have been wondering if there would be a better way to decide which grammar rule parse.
Also, I am not sure how I would parse pipeline command because I will have to know that I have a pipe before creating the AST of the tokens at the left of the pipe .
I was wondering if keeping token in a stack or add a get_prev_token function in the lexer could be interesting?
Thank you for your help, have a nice day!
Edit: Also if you would have some documentation it could really help me the only helpful documentation I found for now is this on :https://shell.multun.net/
Related
Program is intended to store values in a symbol table and then have them be able to be printed out stating the part of speech. Further to be parsed and state more in the parser, whether it is a sentence and more.
I create the executable file by
flex try1.l
bison -dy try1.y
gcc lex.yy.c y.tab.c -o try1.exe
in cmd (WINDOWS)
My issue occurs when I try to declare any value when running the executable,
verb run
it goes like this
BOLD IS INPUT
verb run
run
run
syntax error
noun cat
cat
syntax error
run
run
syntax error
cat run
syntax error
MY THOUGHTS: I'm unsure why I'm getting this error back from the code Syntax error. Although after debugging and trying to print out what value was being stored, I figured there has to be some kind of issue with the linked list. As it seemed only one value was being stored in the linked list and causing an error of sorts. As I tried to print out the stored word_type integer value for run and it would print out the correct value 259, but would refuse to let me define any other words to my symbol table. I reversed the changes of the print statements and now it works as previously stated. I think again there is an issue with the addword method as it isn't properly being added so the lookup method is crashing the program.
Lexer file, this example is taken from O'Reily 2nd edition on Lex And Yacc,
Example 1-5,1-6.
Am trying to learn Lex and Yacc on my own and reproduce this example.
%{
/*
* We now build a lexical analyzer to be used by a higher-level parser.
*/
#include <stdlib.h>
#include <string.h>
#include "ytab.h" /* token codes from the parser */
#define LOOKUP 0 /* default - not a defined word type. */
int state;
%}
/*
* Example from page 9 Word recognizer with a symbol table. PART 2 of Lexer
*/
%%
\n { state = LOOKUP; } /* end of line, return to default state */
\.\n { state = LOOKUP;
return 0; /* end of sentence */
}
/* whenever a line starts with a reserved part of speech name */
/* start defining words of that type */
^verb { state = VERB; }
^adj { state = ADJ; }
^adv { state = ADV; }
^noun { state = NOUN; }
^prep { state = PREP; }
^pron { state = PRON; }
^conj { state = CONJ; }
[a-zA-Z]+ {
if(state != LOOKUP) {
add_word(state, yytext);
} else {
switch(lookup_word(yytext)) {
case VERB:
return(VERB);
case ADJECTIVE:
return(ADJECTIVE);
case ADVERB:
return(ADVERB);
case NOUN:
return(NOUN);
case PREPOSITION:
return(PREPOSITION);
case PRONOUN:
return(PRONOUN);
case CONJUNCTION:
return(CONJUNCTION);
default:
printf("%s: don't recognize\n", yytext);
/* don't return, just ignore it */
}
}
}
. ;
%%
int yywrap()
{
return 1;
}
/* define a linked list of words and types */
struct word {
char *word_name;
int word_type;
struct word *next;
};
struct word *word_list; /* first element in word list */
extern void *malloc() ;
int
add_word(int type, char *word)
{
struct word *wp;
if(lookup_word(word) != LOOKUP) {
printf("!!! warning: word %s already defined \n", word);
return 0;
}
/* word not there, allocate a new entry and link it on the list */
wp = (struct word *) malloc(sizeof(struct word));
wp->next = word_list;
/* have to copy the word itself as well */
wp->word_name = (char *) malloc(strlen(word)+1);
strcpy(wp->word_name, word);
wp->word_type = type;
word_list = wp;
return 1; /* it worked */
}
int
lookup_word(char *word)
{
struct word *wp = word_list;
/* search down the list looking for the word */
for(; wp; wp = wp->next) {
if(strcmp(wp->word_name, word) == 0)
return wp->word_type;
}
return LOOKUP; /* not found */
}
Yacc file,
%{
/*
* A lexer for the basic grammar to use for recognizing English sentences.
*/
#include <stdio.h>
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence: subject VERB object{ printf("Sentence is valid.\n"); }
;
subject: NOUN
| PRONOUN
;
object: NOUN
;
%%
extern FILE *yyin;
main()
{
do
{
yyparse();
}
while (!feof(yyin));
}
yyerror(s)
char *s;
{
fprintf(stderr, "%s\n", s);
}
Header file, had to create 2 versions for some values not sure why but code was having an issue with them, and I wasn't understanding why so I just created a token with the full name and the shortened as the book had only one for each.
# define NOUN 257
# define PRON 258
# define VERB 259
# define ADVERB 260
# define ADJECTIVE 261
# define PREPOSITION 262
# define CONJUNCTION 263
# define ADV 260
# define ADJ 261
# define PREP 262
# define CONJ 263
# define PRONOUN 258
If you feel that there is a problem with your linked list implementation, you'd be a lot better off testing and debugging it with a simple driver program rather than trying to do that with some tools (flex and bison) which you are still learning. On the whole, the simpler a test is and the fewest dependencies which it has, the easier it is to track down problems. See this useful essay by Eric Clippert for some suggestions on debugging.
I don't understand why you felt the need to introduce "short versions" of the token IDs. The example code in Levine's book does not anywhere use these symbols. You cannot just invent symbols and you don't need these abbreviations for anything.
The comment that you "had to create 2 versions [of the header file] for some values" but that the "code was having an issue with them, and I wasn't understanding why" is far too unspecific for an answer. Perhaps the problem was that you thought you could use identifiers which are not defined anywhere, which would certainly cause a compiler error. But if there is some other issue, you could ask a question with an accurate problem description (that is, exactly what problem you encountered) and a Minimal, Complete, and Verifiable example (as indicated in the StackOverflow help pages).
In any case, manually setting the values of the token IDs is almost certainly preventing you from being able to recognized inputs. Bison/yacc reserves the values 256 and 257 for internal tokens, so the first one which will be generated (and therefore used in the parser) has value 258. That means that the token values you are returning from your lexical scanner have a different meaning inside bison. Bottom line: Never manually set token values. If your header isn't being generated correctly, figure out why.
As far as I can see, the only legal input for your program has the form:
sentence: subject VERB object
Since none of your sample inputs ("run", for example) have this form, a syntax error is not surprising. However, the fact that you receive a very early syntax error on the input "cat" does suggest there might be a problem with your symbol table lookup. (That's probably the result of the problem noted above.)
I am attempting to wrap a C library (which I did not write, and whose interfaces cannot be changed) using SWIG. It's mostly straightforward, but there's one field of one struct that's giving me trouble. The relevant struct definition looks like this:
struct Token {
const char *buffer;
const char *word;
unsigned short wordlen;
// ... other fields ...
};
buffer is a normal C string and should be exposed normally (but immutably). word is the problem field. It is a pointer to somewhere within the buffer string, and is meant to be understood as a string of length wordlen. I want to expose this to high-level languages as a normal read-only string, so they don't always have to be taking a slice.
I think the way to handle this is with an "out" typemap specifically for Token::word, something like this:
struct Token {
%typemap (out) const char *word {
$result = SWIG_FromCharPtrAndSize($1, ?wordlen?);
}
}
and this is where I got stuck: How do I access the wordlen field of the parent structure from this typemap?
Or if there's a better way to handle this entire issue, please tell me about that instead.
Unfortunately, it looks like SWIG has no support for mapping multiple structure members at the same time. Inspecting the generated output we learn that (arg1) points to the input structure. Thus, we have to:
Make the word immutable, so that the set wrapper is not generated.
Import the SWIG_FromCharPtrAndSize fragment - it's not available by default.
Map word using SWIG_FromCharPtrAndSize as you wished, referring to (arg1)->wordlen.
Skip wordlen, so that it's not mapped (either by %ignore-ing it, or not providing it in the struct visible to SWIG).
The following is a complete example. First, the header:
// main.h
#pragma once
struct Token {
const char *word;
unsigned short wordlen;
};
struct Token *make_token(void);
extern char *word_check;
And the SWIG module - note that we use the header verbatim, only overriding the definition of struct Token:
// token_mod.i
%module token_mod
%{#include "main.h"%}
%ignore Token;
%include "main.h"
%rename("%s") Token;
struct Token {
%immutable word;
%typemap (out, fragment="SWIG_FromCharPtrAndSize") const char *word {
$result = SWIG_FromCharPtrAndSize($1, (arg1)->wordlen);
}
const char *word;
%typemap (out) const char *word;
};
The demo code that uses Python to check that things work:
// https://github.com/KubaO/stackoverflown/tree/master/questions/swig-pair-53915787
#include <assert.h>
#include <stdlib.h>
#include <Python.h>
#include "main.h"
struct Token *make_token(void) {
struct Token *r = malloc(sizeof(struct Token));
r->word = "1234";
r->wordlen = 2;
return r;
}
char *word_check;
#if PY_VERSION_HEX >= 0x03000000
# define SWIG_init PyInit__token_mod
PyObject*
#else
# define SWIG_init init_token_mod
void
#endif
SWIG_init(void);
int main()
{
PyImport_AppendInittab("_token_mod", SWIG_init);
Py_Initialize();
PyRun_SimpleString(
"import sys\n"
"sys.path.append('.')\n"
"import token_mod\n"
"from token_mod import *\n"
"token = make_token()\n"
"cvar.word_check = token.word\n");
assert(word_check && strcmp(word_check, "12") == 0);
Py_Finalize();
return 0;
}
Finally, the CMakeLists.txt that makes the demo - it can be used with Python 2.7 or 3.x. Note: To switch Python versions, the build directory must be wiped (or at least the cmake caches in it have to be wiped).
cmake_minimum_required(VERSION 3.2)
set(Python_ADDITIONAL_VERSIONS 3.6)
project(swig-pair)
find_package(SWIG 3.0 REQUIRED)
find_package(PythonLibs 3.6 REQUIRED)
include(UseSwig)
SWIG_MODULE_INITIALIZE(${PROJECT_NAME} python)
SWIG_ADD_SOURCE_TO_MODULE(${PROJECT_NAME} swig_generated_sources "token_mod.i")
add_executable(${PROJECT_NAME} "main.c" ${swig_generated_sources})
target_include_directories(${PROJECT_NAME} PRIVATE ${PYTHON_INCLUDE_DIRS} ".")
target_link_libraries(${PROJECT_NAME} PRIVATE ${PYTHON_LIBRARIES})
The higher level languages don't care that wordlen is the size of word. Only the C does. If you cant change the C that you are swiging then you have to leave it as is and remember as you write in the higher languages that the char has a size. Also Swig and const don't like each other. Hereis there documentation on consts
Hi i am trying to run John code from lex and yacc book by R. Levine i have compiled the lex and yacc program in linux using the commands
lex example.l
yacc example.y
gcc -o example y.tab.c
./example
the program asks the user for input of verbs,nouns,prepositions e.t.c in the format
verb accept,admire,reject
noun jam,pillow,knee
and then runs the grammar in yacc to check if it's a simple or compound sentence
but when i type
jam reject knee
it shows noting on screen where it is supposed to show the line "Parsed a simple sentence." on parsing.The code is given below
yacc file
%{
#include <stdio.h>
/* we found the following required for some yacc implementations. */
/* #define YYSTYPE int */
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence: simple_sentence { printf("Parsed a simple sentence.\n"); }
| compound_sentence { printf("Parsed a compound sentence.\n"); }
;
simple_sentence: subject verb object
| subject verb object prep_phrase
;
compound_sentence: simple_sentence CONJUNCTION simple_sentence
| compound_sentence CONJUNCTION simple_sentence
;
subject: NOUN
| PRONOUN
| ADJECTIVE subject
;
verb: VERB
| ADVERB VERB
| verb VERB
;
object: NOUN
| ADJECTIVE object
;
prep_phrase: PREPOSITION NOUN
;
%%
extern FILE *yyin;
main()
{
while(!feof(yyin)) {
yyparse();
}
}
yyerror(s)
char *s;
{
fprintf(stderr, "%s\n", s);
}
lex file
%{
/*
* We now build a lexical analyzer to be used by a higher-level parser.
*/
#include "ch1-06y.h" /* token codes from the parser */
#define LOOKUP 0 /* default - not a defined word type. */
int state;
%}
%%
\n { state = LOOKUP; }
\.\n { state = LOOKUP;
return 0; /* end of sentence */
}
^verb { state = VERB; }
^adj { state = ADJECTIVE; }
^adv { state = ADVERB; }
^noun { state = NOUN; }
^prep { state = PREPOSITION; }
^pron { state = PRONOUN; }
^conj { state = CONJUNCTION; }
[a-zA-Z]+ {
if(state != LOOKUP) {
add_word(state, yytext);
} else {
switch(lookup_word(yytext)) {
case VERB:
return(VERB);
case ADJECTIVE:
return(ADJECTIVE);
case ADVERB:
return(ADVERB);
case NOUN:
return(NOUN);
case PREPOSITION:
return(PREPOSITION);
case PRONOUN:
return(PRONOUN);
case CONJUNCTION:
return(CONJUNCTION);
default:
printf("%s: don't recognize\n", yytext);
/* don't return, just ignore it */
}
}
}
. ;
%%
/* define a linked list of words and types */
struct word {
char *word_name;
int word_type;
struct word *next;
};
struct word *word_list; /* first element in word list */
extern void *malloc();
int
add_word(int type, char *word)
{
struct word *wp;
if(lookup_word(word) != LOOKUP) {
printf("!!! warning: word %s already defined \n", word);
return 0;
}
/* word not there, allocate a new entry and link it on the list */
wp = (struct word *) malloc(sizeof(struct word));
wp->next = word_list;
/* have to copy the word itself as well */
wp->word_name = (char *) malloc(strlen(word)+1);
strcpy(wp->word_name, word);
wp->word_type = type;
word_list = wp;
return 1; /* it worked */
}
int
lookup_word(char *word)
{
struct word *wp = word_list;
/* search down the list looking for the word */
for(; wp; wp = wp->next) {
if(strcmp(wp->word_name, word) == 0)
return wp->word_type;
}
return LOOKUP; /* not found */
}
header file
# define NOUN 257
# define PRONOUN 258
# define VERB 259
# define ADVERB 260
# define ADJECTIVE 261
# define PREPOSITION 262
# define CONJUNCTION 263
You have several problems:
The build details you describe do not follow the usual pattern, and in fact they do not work for the code you provide.
Having sorted out how to build your program, it does not work at all, instead segfaulting before reading any input.
Having solved that problem, your expectation of the program's behavior with the given input is incorrect in at least two ways.
With respect to the build:
yacc builds C source for a parser and optionally a header file containing corresponding token definitions. It is usual to exercise the option to get the definitions, and to #include their header in the lexer's source file (#include 'y.tab.h'):
yacc -d example.y
lex builds C source for a lexical analyzer. This can be done either before of after yacc, as lex does not depend directly on the token definitions:
lex example.l
The two generated C source files must be compiled and linked together, possibly with other sources as well, and possibly with libraries. In particular, it is often convenient to link in libl (or libfl if your lex is really GNU flex). I linked the latter to get the default yywrap():
gcc -o example lex.yy.c y.tab.c -lfl
With respect to the segfault:
Your generated program is built around this:
extern FILE *yyin;
main()
{
while(!feof(yyin)) {
yyparse();
}
}
In the first place, you should read Why is “while ( !feof (file) )” always wrong?. Having had that under consideration might have spared you from committing a much more fundamental mistake: evaluating yyin before it has been set. Although it's true that yyin will be set to stdin if you don't set it to something else, that cannot happen at program initialization because stdin is not a compile-time constant. Therefore, when control first reaches the loop control expression, yyin's value is still NULL, and a segfault results.
It would be safe and make more sense to test for end of file after yyparse() returns.
With respect to behavioral expectations
You complained that the input
verb accept,admire,reject
noun jam,pillow,knee
jam reject knee
does not elicit any output from the program, but that's not exactly true. That input does not elicit output from the program when it is entered interactively, without afterward sending an end-of-file signal (i.e. by typing control-D at the beginning of a line).
The parser not yet having detected end-of-file in that case (and not paying any attention at all to newlines, since your lexer notifies it about them only when they immediately follow a period), it has no reason to attempt to reduce its token stack to the start symbol. It could be that you will continue with an object to extend the simple sentence, and it cannot be sure that it won't see a CONJUNCTION next, eithger. It doesn't print anything because it's waiting for more input. If you end the sentence with a period or afterward send a control-D then it will in fact print "Parsed a simple sentence."
I'm having a problem using (reentrant) Flex + Lemon for parsing. I'm using a simple grammar and lexer here. When I run it, I'll put in a number followed by an EOF token (Ctrl-D). The printout will read:
89
found int of .
AST=0.
Where the first line is the number I put in. Theoretically, the AST value should be the sum of everything I put in.
EDIT: when I call Parse() manually it runs correctly.
Also, lemon appears to run the atom ::= INT rule even when the token is 0 (the stop token). Why is this? I'm very confused about this behavior and I can't find any good documentation.
Okay, I figured it out. The reason is that there is a particularly nasty (and poorly documented) interaction going on between flex and lemon.
In an attempt to save memory, lemon will hold onto a token without copying, and push it on to an internal token stack. However, flex also tries to save memory by changing the value that yyget_text points to as it lexes the input. The offending line in my example is:
// in the do loop of main.c...
Parse(parser, token, yyget_text(lexer));
This should be:
Parse(parser, token, strdup(yyget_text(lexer)));
which will ensure that the value that lemon points to when it reduces the token stack later is the same as what you originally passed in.
(Note: Don't forget, strdup means you'll have to free that memory at some point later. Lemon will let you write token "destructors" that can do this, or if you're building an AST tree you should wait until the end of the AST lifetime.)
You might also try making a token type that contains a pointer to the string and the length of the string. I've had success with this.
token.h
#ifndef Token_h
#define Token_h
typedef struct Token {
int code;
char * string;
int string_length;
} Token;
#endif // Token_h
main.c
int main(int argc, char** argv) {
// Set up the scanner
yyscan_t scanner;
yylex_init(&scanner);
yyset_in(stdin, scanner);
// Set up the parser
void* parser = ParseAlloc(malloc);
// Do it!
Token t;
do {
t.code = yylex(scanner);
t.string = yyget_text(scanner);
t.string_length = yyget_leng(scanner);
Parse(parser, t.code, t);
} while (t.code > 0);
if (-1 == t.code) {
fprintf(stderr, "The scanner encountered an error.\n");
}
// Cleanup the scanner and parser
yylex_destroy(scanner);
ParseFree(parser, free);
return 0;
}
language.y (excerpt)
class_interface ::= INTERFACE IDENTIFIER(A) class_inheritance END.
{
printf("defined class %.*s\n", A.string_length, A.string);
}
See my printf statement there? I'm using the string and the length to print out my token.
Im trying to assign a function inside a struct, so far I have this code:
typedef struct client_t client_t, *pno;
struct client_t
{
pid_t pid;
char password[TAM_MAX]; // -> 50 chars
pno next;
pno AddClient()
{
/* code */
}
};
int main()
{
client_t client;
// code ..
client.AddClient();
}
**Error**: *client.h:24:2: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘{’ token.*
Which is the correct way to do it ?
It can't be done directly, but you can emulate the same thing using function pointers and explicitly passing the "this" parameter:
typedef struct client_t client_t, *pno;
struct client_t
{
pid_t pid;
char password[TAM_MAX]; // -> 50 chars
pno next;
pno (*AddClient)(client_t *);
};
pno client_t_AddClient(client_t *self) { /* code */ }
int main()
{
client_t client;
client.AddClient = client_t_AddClient; // probably really done in some init fn
//code ..
client.AddClient(&client);
}
It turns out that doing this, however, doesn't really buy you an awful lot. As such, you won't see many C APIs implemented in this style, since you may as well just call your external function and pass the instance.
As others have noted, embedding function pointers directly inside your structure is usually reserved for special purposes, like a callback function.
What you probably want is something more like a virtual method table.
typedef struct client_ops_t client_ops_t;
typedef struct client_t client_t, *pno;
struct client_t {
/* ... */
client_ops_t *ops;
};
struct client_ops_t {
pno (*AddClient)(client_t *);
pno (*RemoveClient)(client_t *);
};
pno AddClient (client_t *client) { return client->ops->AddClient(client); }
pno RemoveClient (client_t *client) { return client->ops->RemoveClient(client); }
Now, adding more operations does not change the size of the client_t structure. Now, this kind of flexibility is only useful if you need to define many kinds of clients, or want to allow users of your client_t interface to be able to augment how the operations behave.
This kind of structure does appear in real code. The OpenSSL BIO layer looks similar to this, and also UNIX device driver interfaces have a layer like this.
How about this?
#include <stdio.h>
typedef struct hello {
int (*someFunction)();
} hello;
int foo() {
return 0;
}
hello Hello() {
struct hello aHello;
aHello.someFunction = &foo;
return aHello;
}
int main()
{
struct hello aHello = Hello();
printf("Print hello: %d\n", aHello.someFunction());
return 0;
}
This will only work in C++. Functions in structs are not a feature of C.
Same goes for your client.AddClient(); call ... this is a call for a member function, which is object oriented programming, i.e. C++.
Convert your source to a .cpp file and make sure you are compiling accordingly.
If you need to stick to C, the code below is (sort of) the equivalent:
typedef struct client_t client_t, *pno;
struct client_t
{
pid_t pid;
char password[TAM_MAX]; // -> 50 chars
pno next;
};
pno AddClient(pno *pclient)
{
/* code */
}
int main()
{
client_t client;
//code ..
AddClient(client);
}
You are trying to group code according to struct.
C grouping is by file.
You put all the functions and internal variables in a header or
a header and a object ".o" file compiled from a c source file.
It is not necessary to reinvent object-orientation from scratch
for a C program, which is not an object oriented language.
I have seen this before.
It is a strange thing. Coders, some of them, have an aversion to passing an object they want to change into a function to change it, even though that is the standard way to do so.
I blame C++, because it hid the fact that the class object is always the first parameter in a member function, but it is hidden. So it looks like it is not passing the object into the function, even though it is.
Client.addClient(Client& c); // addClient first parameter is actually
// "this", a pointer to the Client object.
C is flexible and can take passing things by reference.
A C function often returns only a status byte or int and that is often ignored.
In your case a proper form might be
/* add client to struct, return 0 on success */
err = addClient( container_t cnt, client_t c);
if ( err != 0 )
{
fprintf(stderr, "could not add client (%d) \n", err );
}
addClient would be in Client.h or Client.c
You can pass the struct pointer to function as function argument.
It called pass by reference.
If you modify something inside that pointer, the others will be updated to.
Try like this:
typedef struct client_t client_t, *pno;
struct client_t
{
pid_t pid;
char password[TAM_MAX]; // -> 50 chars
pno next;
};
pno AddClient(client_t *client)
{
/* this will change the original client value */
client.password = "secret";
}
int main()
{
client_t client;
//code ..
AddClient(&client);
}