How do I get yacc/bison and/or lex/flex to restart scanning after something like token substitution? - c

Is there a way to force bison and/or flex to restart scanning after I replace some token with something else?
My particular example would be with replacement for a specific word/string. If I want a word of hello to be replaced by echo hello, how can I get flex or bison to replace hello and then start parsing again (to pick up 2 words instead of just one). So it would be like:
Get token WORD (which is a string type)
If hello, replace token value with echo hello
Restart parsing entire input (which is now echo hello)
Get token WORD (echo)
Get token WORD (hello)
I've seen very tempting functions like yyrestart(), but I don't really understand what that function in particular really accomplishes. Any help is greatly appreciated, thanks!
Update 4/23/2010
One kind of hack-and-slash solution I've ended up using is for each word that comes through, I check an "alias" array. If the word has an alias, I replace the value of the word (using, for example, strcopy($1,aliasval)), and mark an aliasfound flag.
Once the entire line of input is parsed once, if the aliasfound flag is true, I use yy_scan_string() to switch the buffer state to the input with expanded aliases, and call YYACCEPT.
So then it jumps out to the main function and I call yyparse() again, with the buffer still pointing to my string. This continues until no aliases are found. Once all of my grammar actions are complete, I call yyrestart(stdin) to go back to "normal" mode.
If anyone knows how I can effectively expand my words w/ their alias values, inject into stdin (or some other method), and basically expand all aliases (even nested) as I go, that would be awesome. I was playing around with yypush_buffer_state() and yypop_buffer_state(), along with yy_switch_to_buffer(), but I couldn't get "inline" substitution with continued parsing working...

It seems to me that the place to fix this is the lexer. I would suggest using flex, which supports a state machine (called "Start Conditions" in the flex documentation). You change states using BEGIN, and the states need to be defined in the definitions section.
So, for example, you could have a rule like
<INITIAL>hello BEGIN(in_echo); yyless(0); return (WORD_ECHO);
<in_echo>hello BEGIN(0); return (WORD_HELLO);
yyless() truncates the yytext to the given value, so this puts the entire input back into the stream.
I haven't tried this out myself, but I think this is the structure of the solution you want.

Adding an "answer" based on what I ended up doing. Want to mark this question as answered.
Update 4/23/2010
One kind of hack-and-slash solution I've ended up using is for each word that comes through, I check an "alias" array. If the word has an alias, I replace the value of the word (using, for example, strcopy($1,aliasval)), and mark an aliasfound flag.
Once the entire line of input is parsed once, if the aliasfound flag is true, I use yy_scan_string() to switch the buffer state to the input with expanded aliases, and call YYACCEPT.
So then it jumps out to the main function and I call yyparse() again, with the buffer still pointing to my string. This continues until no aliases are found. Once all of my grammar actions are complete, I call yyrestart(stdin) to go back to "normal" mode.
If anyone knows how I can effectively expand my words w/ their alias values, inject into stdin (or some other method), and basically expand all aliases (even nested) as I go, that would be awesome. I was playing around with yypush_buffer_state() and yypop_buffer_state(), along with yy_switch_to_buffer(), but I couldn't get "inline" substitution with continued parsing working...

Related

Bash - loop through array of objects and combine them

I'm trying to create a for-loop to go through all the items from an array, and add the items to a string. The tags are given as a single string with format "tag1 tag2 tag3", and the tagging parameter can be given as many times as I want with the single command with syntax "-tag tag1 -tag -tag2 -tag tag3". I'm unable to create a for loop for the job, and I'm a little confused what is wrong with my code.
TAGS="asd fgh jkl zxc bnm" # Amount of tags varies, but there is always at least one
ARRAY=($TAGS)
TAGSTOBEADDED=""
for i in "$ARRAY[#]"
do
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
done
command $TAGSTOBEADDED
First, your array sintax is wrong as #oguz ismail said. To iter through array items you shold use this:
for i in "${ARRAY[#]}"; { echo $i;}
Second $TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED this is also fail.
Variables are set like so var="$var 123" you don't need $ in front of var name if you want to change it. Back to code. In this example you dont even need an array, just use TAGS var(without ""):
for i in $TAGS; { TAGSTOBEADDED+="-tag $i"; }
First: avoid storing lists of things in space-delimited strings (as you're currently doing with TAGS and TAGSTOBEADDED) -- there are a bunch of things that can go wrong if they have any "funny" characters (or if IFS gets changed). Use an array instead. Storing them as a string and then converting doesn't help; all of the same potential problems apply during the conversion.
I also recommend using lower- or mixed-case variable names in scripts, since there are a bunch of all-caps names with special meanings, and accidentally using one of those for something else can have weird effects. So, to define the array of tags, I'd just use this:
tags=(asd fgh jkl zxc bnm)
You also have a number of syntax errors in the script. In this line:
for i in "$ARRAY[#]"
... the shell will try to expand $ARRAY as a plain variable (not an array), and then treat "[#]" as just some unrelated characters that go after it. You need braces around the variable refence (like "${ARRAY[#]}") any time you're doing anything nontrivial with a variable reference. BTW, this idiom -- including double-quotes, braces, square-brackets and at-sign -- is what you almost always want when getting the contents of an array.
In this line:
STRINGTOBEADDED="-tag ${ARRAY[$i]}"
$i will expand to one of the array elements, not its index. That is, it'll expand to something like:
STRINGTOBEADDED="-tag ${ARRAY[asd]}"
...which doesn't make any sense. You just want
STRINGTOBEADDED="-tag $i"
...except you don't want that either, because (as I said before) storing lists of things space-delimited in a string is a bad idea. But I'll get to that because fixing it will involve the next line:
$TAGSTOBEADDED=$TAGSTOBEADDED+$STRINGTOBEADDED
There are two problems here: you don't want a dollar sign on the variable being assigned to ($varname gets the value of a variable; anytime you're setting it, don't use the $). Also the + isn't needed to add strings, you just stick them end to end. Well, you'd need to add a space in between, something like one of these:
TAGSTOBEADDED=$TAGSTOBEADDED" "$STRINGTOBEADDED
TAGSTOBEADDED="$TAGSTOBEADDED $STRINGTOBEADDED"
(Generally, you should have double-quotes around all variable references; on the right side of a plain assignment is one of the few places it's safe to leave them unquoted, but I tend to prefer to just double-quote always rather than try to remember all of the exceptions about where it's safe and where it isn't. Plus, quoting just the space looks weird.)
But you don't want to do that either, because (again) space-delimited strings are a bad way to do things. Use an array. So before the loop, create an empty array instead of an empty string:
tagstobeadded=()
...and then inside the loop, append to it with +=( ):
tagstobeadded+=(-tag "$i")
...and then at the end, use it with all the appropriate quotes, braces, etc:
command "${tagstobeadded[#]}"
So, with all of these changes, here's what I'd recommend:
tags=(asd fgh jkl zxc bnm)
tagstobeadded=()
for i in "${tags[#]}"
do
tagstobeadded+=(-tag "$i")
done
command "${tagstobeadded[#]}"

error in looping over files, -fs- command

I'm trying to split some datasets in two parts, running a loop over files like this:
cd C:\Users\Macrina\Documents\exports
qui fs *
foreach f in `r(files)' {
use `r(files)'
keep id adv*
save adv_spa*.dta
clear
use `r(files)'
drop adv*
save fin_spa*.dta
}
I don't know whether what is inside the loop is correctly written but the point is that I get the error:
invalid '"e2.dta'
where e2.dta is the second file in the folder. Does this message refer to the loop or maybe what is inside the loop? Where is the mistake?
You want lines like
use "`f'"
not
use `r(files)'
given that fs (installed from SSC, as you should explain) returns r(files) as a list of all the files whereas you want to use each one in turn (not all at once).
The error message was informative: use is puzzled by the second filename it sees (as only one filename makes sense). The other filenames are ignored: use fails as soon as something is evidently wrong.
Incidentally, note that putting "" around filenames remains essential if any includes spaces.

Grammar file (grammar.txt)

I am actually working on a grammar file and I am reading the grammar.txt file.
The 20 first lines are new to me.
%s/^\d*\.\s*(\w*)
%s/^\d*\.\s*\(\w*\)
%s/^\d*\.\s*\(\w*\)/<\1>
%s/^\d*\.\s*\(\w*\)/\1
%s/\<\(\w*\)\>
%s/"\w*\"
%s/"\(\w*\)\"/_\1_/g
%s/"\(\w*\)\"/&\1&/g
%s/"\(\w*\)\"/123456\1/g
%s/"\(\w*\)\"/**\1**/g
%s/"\(.*\)\"/$\1$/g
%s/"\(\w*\)\"/$\1$/g
%s/"/'/g
%s/'\(\w*\)'\/$\1$/g
Does anyone know what this lines refers to?
This looks like list of replacement rules someone tried to run in vim.
It seems as the someone didn't know how to use it, so was trying to figure it out.
the proper structure is %s/match/replacement/flags
%s means search through all lines in the entire file,
match is regular expression that you are looking for,
replacement is what the match will be replaced with,
flags are regexp flags, in this case g, which will replace all occurrences at each line.
more info on vim's search and replace

Stable text feed for vim through vimserver

I am searching for a highly stable way to feed text (output of a program) into vim through vimserver. Assume that I have started a (g)vim session with gvim --servername vim myfile. The file myfile contains a (unique) line OUT: which marks the position where the text should be pasted. I can straight forwardly achieve this from the commandline with vim --servername vim --remote-send ':%s/OUT:/TEXT\\rOUT:/<Enter>'. I can repeatedly feed more text using the same command. Inside a C-program I can execute it with system(). However TEXT which is dynamic and arbitrary (received as a stream in the C-program) needs to be passed on the command line and hence it needs to be escaped. Furthermore using the replacement command %s vim will jump to the position where TEXT is inserted. I would like to find a way to paste large chunks of arbitrary text seamlessly in vim. An idea is to have vim read from a posix pipe with :r pipe and to write the the string from within the C-program to the pipe. Ideally the solution would be such that I can continuously edit the same file manually without noting that output is added at OUT: as long as this location is outside the visible area.
The purpose of this text feed is to create a command line based front end for scripting languages. The blocks of input is entered manually by the user in a vim buffer and is being sent to the interpreter through a pipe using vim's :! [interpreter] command. The [interpreter] can of course write the output to stdout (preceded by the original lines of input) in which case the input line is replaced by input and output (to be distinguished using some leading key characters for instance). However commands might take a long time to produce the actual output while the user might want to continue editing the file. Therefore my idea is to have [interpreter] return OUT: immediately and to append subsequent lines of output in this place as they become available using vimserver. However the output must be inserted in a way which does not disturb or corrupt the edits possibly made by the user at the same time.
EDIT
The proposed solutions seem to work.
However there seem to be at least two caveats: * if I send text two or more times this way the `` part of the commands will not take me back to the original cursor position (if I do it just once still the markers are modified which may interrupt the user editing the file manually) * if the user opens a different buffer (e.g. the online help) the commands will fail (or maybe insert the text in the present buffer)
Any ideas?
EDIT: After actually trying, this should work for you:
vim --servername vim --remote-send \
":set cpo+=C<CR>/OUT:<CR>:i<CR>HELLO<CR>WORLD<CR>.<CR>\`\`"
As far as I can see, the only caveats would be the period on a single line, which would terminate :insert instead of being inserted, and <..> sequences that might be interpreted as keypresses. Also, need to replace any newlines in the text with <CR>. However, you have no worries about regular expressions getting muddled, the input is not the command line, the amount of escaping necessary is minimal, and the jumping is compensated for with the double backticks.
Check out :help 'cpoptions', :help :insert, and :help ''.
Instead of dealing with the escaping, I would rather use lower-level functions. Use let lnum = search('^OUT:$') to locate the marker line, and call setline(lnum, [text, 'OUT:']) to insert the text and the marker line again.

How can I check if an instance of a subtring is at the end of a string and just once (say using strstr)?

I want to check if user is entering something like ls& or ls & so that I will be able to set the bg_flag for background jobs in the shell. however the following code can't check against ls&&. because I don't want it to be considered a background job(I am not sure if it's the same in Linux shell) if user enters ls&& or anything with more than one & in the end.
if (strstr(args[arg_count-1],"&")!=NULL)
//if (strcmp(args[arg_count-1],"&")==0)
{
bg_flag=1;
printf("I am a background job %d ",getpid());
}
Please let me know what is the appropriate method for fixing this?
To do that reliably, you need to properly parse the command in the same way the shell does it. Usually you define a grammar for that and use a parser generator. Everything else is just guessing and will most likely fail.
For example, consider this:
some_program \&&
it ends with two ampersands (&&) but anyway it will be a background process because the first ampersand is escaped (\&).
However, marking the postfix \&& as "background" is also not correct because
some_program \\&&
would not be a background process (but an incomplete command instead).
And as long as you don't define a proper grammar, it's very likely that you will not correctly catch everything as my two examples may show.
Another sort of problems might be programs that detach themselves from the terminal (sometimes called daemonization). They are not backgrounded by the shell, they do the same by themselves.

Resources