How to handle labels in a scripting language I'm writing? - c

So I've been stewing over this for a long time, thinking about it. Here's a code example first, and then I'll explain it.
:main
dostuff
otherlabel
:otherlabel
dostuff
Alright so in this example, main is where the code starts, and it 'calls' the label 'otherlabel'. This is really just a shortcut for a jump command that changes execution to a different location in memory. My problem though is, how do I handle these labels so that they don't have to be declared before they are called?
At the moment, I'm doing a single step compilation reading straight from the source and outputting the bytecode. I am simply handling labels and adding them to a dictionary when I find them. And then I replace 'otherlabel' with a jump command to the correct location in code. But in this case that code wouldn't compile.
I've thought of a few ways to do this:
First is handling labels before anything else but this requires me to do everything in two steps and I have to deal with the same code twice, this slows down the process and just seems like a mess.
Second is queueing up the label calls until AFTER I've gone through the entire file and compiled everything else and then dealing with them, this seems much cleaner.
I'm writing this in C so I'd rather not implement complex data structures, I'm looking for the most straight forward way to handle this.

Use multiple passes. One pass isn't going to suffice for a scripting language, especially when you are getting to the more complex structures.
In a first pass, before compiling, construct your dictionary of labels.
In a later pass, when the compiling happens, just use that dictionary.

You could use "backpatching", although it sounds like that's what you've tried already; and it could be consstrued as a complex structure.
When you encounter a call to an undefined label, you emit the jump with a blank address field (probably into a buffer, otherwise this becomes the same as "multipass" if you have to re-read the file to patch it); and you also store a pointer to the blank field in a "patch-up" list in the dictionary. When you encounter the label definition, you fill-in all the blanks in the list, and proceed normally.

Related

Should I keep source files in memory while parsing?

I'm writing the front-end part of an interpreter and I initially disliked the idea of just dumping all the source files into memory and then referencing that text directly. So the tokenizer reads from a char buffers and builds the token stream.
However, I have reached the parsing side of things and it hit me that because I would want to output nice errors and warnings that show the malformed line of source code. I guess I could put column numbers in the tokens, but then by error messages would be like getting directions by telephone: "It's in file X, on line Y, column Z, right next to the curly brace, you know the one. If you hit the semicolon, you've gone to far."
I seem to have put myself into a situation where I want to have my cake and eat it too. I want nice messages, but I don't want to hog memory.
It there something I'm missing? Or is loading the source in memory the way to go?
When there's an error to report to the user, it hardly matters how long in milliseconds it takes to report it.
I'd keep your tokenized stream in memory to keep your interpreter fast. (Actually, you should switch to a threaded interpreter or even a bad one pass compiler to enhance the execution rate).
When you encounter an error, go to the disk, fetch the line(s) of interest, and show them to the user. If he doesn't make any errors, this costs you zero. If he makes a small number of errors, that may be tiny bit inefficient but the user won't know. If he makes large number of errors, the file content of the files containing errors are going to read by the OS into its local cache, which is likely bigger than your programs anyway, and so access will be more efficient than if you kept the source entirely on the disk.
Better idea: mmap your sources in the first place, if you can. Fall back to slurping the whole file if you're reading from a pipe or something.
After parsing, you may want to call madvise(MADV_DONTNEED) (but only if it was originally mmaped) to advise the kernel to drop it from the cache (but still keep it available for errors) ... but this is probably not necessary, and may even not be a good idea, depending on your compiler design (e.g. are identifiers still pointing, or are they interned to a single, separate, allocation).

Accurately count number of keywords "if", "while" in a c file

Are there any libraries out there that I can pass my .c files through and will count the visible number of, of example, "if" statements?
We don't have to worry about "if" statement in other files called by the current file, just count of the current file.
I can do a simple grep or regex but wanted to check if there is something better (but still simple)
If you want to be sure it's done right, I'd probably make use of clang and walk the ast. A URL to get you started:
http://clang.llvm.org/docs/IntroductionToTheClangAST.html
First off, there is no way to use regular expressions or grep to give you the correct answer you are looking for. There are lots of ways that you would find those strings, but they could be buried in any amount of escape characters, quotations, comments, etc.
As some commenters have stated, you will need to use a parser/lexer that understands the C language. You want something simple, you said, so you won't be writing this yourself :)
This seems like it might be usable for you:
http://wiki.tcl.tk/3891
From the page:
lexes a string containing C source into a list of tokens
That will probably get you what you want, but even then it's not going to be trivial.
What everyone has said so far is correct; it seems a lot easier to just grep the shit out of your file. The performance hit of this is neglagible compared to the alternative which is to go get the gcc source code (or whichever compiler you're using), and then go through the parsing code and hook in what you want to do while it parses the syntax tree. This seems like a pain in the ass, especially when all you're worried about is the conditional statements. If you actually care about the branches, you could actually just take a look at the object code and count the number of if statements in the assembly, which would correctly tell you the number of branches (rather than just relying on how many times you typed a conditional, which will not translate exactly to the branching of the program).

How to evaluate if a code is correct against a submitted solution

I´m searching information about how to compare two codes and decide if the code submitted by someone is correct or not (based on a solution code defined before).
I could compare the output but many codes may have the same output. Then I think I must compare someway the codes and give a percentage of similitude.
Anybody can help me?
(the language code is C but I think this isn´t important)
Some of my teachers used online automated program grading systems like http://web-cat.org/
In the assignment they would specify a public api you must provide, and then they would just write tests against your functions, much like unit tests. They would intentionally pick tests that would exploit boundary conditions and other things students are notorious for not thinking about, and just call your code with many different inputs to try to get your code to fail.
Sometimes they would hardcode the expected values, other times they would allow values within a range, and other times they just did the assignment themselves and made it so your own code has to match the results produced by their code.
Obviously, not all programs can be effectively graded this way. It's also kinda error prone in that sometimes even the teacher made a mistake and overflowed an int or something, then the correct student submissions wouldn't match the teachers incorrect results. But, a system doesn't need to be perfect to be useful. But I think this raises an important point in that manually grading by reading the code won't necessarily reveal all mistakes either.
Another possibility is copy the submitted code, strip out all of the white space and search for substrings that must exist for the code to be correct and/or substrings that cannot exist for the code to be considered correct. The troublesome bit might be setting up to allow for some of the more tricky requirements such as [(a or c),((a or b) and c),((a or b) and c)], where the variables are the result of a boolean check as to if the substring related to the variable exists within the code.
For example, [("printf"),("for"), (not "1,2,3,4,5,6,7,9,10")], would require that "printf" and "for" be substrings in the code, while "1,2,3,4,5,6,7,9,10" i I'm not familiar with C, so I'm I'm assuming here that "printf" is required to be able to print anything without involving output streams, which could be accounted for by something like [("printf" or "out"),("for"), (not "1,2,3,4,5,6,7,9,10")], where "out" is part of C code required to make use of output streams.
It might be possible to automatically find required substrings based on a "correct" code, but as others have mentioned, there are alternative ways to do things. Which is why hard-coding the "solution" is probably required. Even so, it's quite possible that you'll miss a required substring, and it'll be marked as wrong, but it's probably the only way you can do what you ask with some degree of success.
Regular expressions might be useful here.

How to view variables during program execution

I am writing a relatively simple C program in Visual C++, and have two global variables which I would like to know the values of as the program runs. The values don't change once they are assigned, but my programming ability is not enough to be able to quickly construct a text box that displays the values (I'm working in Win32) so am looking for a quick routine that can perhaps export the values to a text file so I can look at them and check they are what they ought to be. Values are 'double'.
I was under the impression that this was the purpose of the debugger, but for me the debugger doesn't run as the 'file not found' is always the case.
Any ideas how I can easily check the value of a global variable (double) in a Win32 app?
Get the debugger working. You should maybe post another question with information about why it won't work - with as much info as possible.
Once you have done that, set a breakpoint, and under Visual C++ (I just tried with 2010), hover over the variable name.
You could also use the watch window to enter expressions and track their values.
If your debugger isn't working try using printf statements wherever the program iterates.
Sometimes this can be a useful way of watching a variable without having to step into it.
If however you wish to run through the program in debug mode set a breakpoint as suggested (in VS2010 you can right click on the line you want to set a breakpoint on).
Then you just need to go to Toolbars -> Debug Toolbar.
I usually like to put #ifdef _DEBUG (or write an appropriate macro or even extra code) to do the printing, and send to the output everything that can help me tracking what the program's doing. Since your variables are never changing, I would do so.
However, flooding the console with lots of values is bad imo, and in such cases I would rely on assertions and the debugger - you should really see why it's not working.
I've done enough Python and Ruby to tell you that debugging a complex program when all you have is a printf, although doable, is extremely frustrating and takes way longer than what it should.
Finally, since you mention your data type is double (please make sure you have a good reason for not using floats instead), in case you add some assertion, remember that == is to be avoided unless you know 100% that == is what you really really want (which is unlikely if your data comes from calculations).

Detecting code blocks when executing a Lua script line by line

This may sound like a silly question but I could see no mention anywhere of this particular problem. Basically:
I want to execute a Lua script line by line, primarily to have the ability to pause/resume execution anytime and anywhere I want. What I do is simple: load a chunk with luaL_loadbuffer() and then issue a lua_pcall().
Thing is... How can I properly detect Lua blocks in order to execute them atomically?
For instance, suppose there's a function in the script -- by executing the file line by line with the method described above, I can't seem to have a way to properly recognize the function, and in consequence its contents are loaded and called one by one.
I can imagine that one solution would be to manually handle a stack where I push control keywords I can recognize in the script ("function", "if", "do", etc) and their corresponding "end" clause if I find nested blocks. Once I push the final "end" I call the entire block, but that sounds simply awful. Surely there must be a better way of doing this.
Hope it makes some sense, and thank you!
Please take a look at Lua coroutines to implement that functionality for scripting game entities. The idea is to yield the coroutine in the sleep() and waitforevent() routines you mentioned, and then resume later (e.g. after a timeout or event occurs).
Use lua_sethook().
Note that you probably do want to experiment with what exact hook call granularity you need. I'd recommend executing chunks of bytecode instructions instead. One really long line may contain many instructions.

Resources