Is there any "almost-usable" static analysis tool for C (or C-like) programs that can automatically infer loop termination, at least for very simple programs?
I looked around a bit and found several research articles, a few prototypes, and even some tools (such as Frama-C) that try to infer some termination properties from an annotated source code, but I was expecting to find at least one simple tool that you could just give it a C program and it would output: loop #N terminates/does not terminate/unknown.
(I know this is undecidable in the general case, but for some classes of loops semi-algorithms are possible).
I'd also be interested in tools that work for imperative languages other than C, such as Java.
Edit: just an update to my question, I found LoopFrog, built on top of goto-cc, that seems to be in the direction of what I was looking for, however I still didn't have time to actually understand what its output means precisely. Should it be the answer to my question, I'll post an update here.
I don't know if you have read these two blog posts (1, 2), but one of the “simple”
tools you are looking for could be a script that, in parallel, launches Frama-C's value analysis as an ordinary sound abstract interpreter (able to infer that the end of a program is unreachable) and with its option -obviously-terminates (in which case it can infer that all executions of a program terminate). In both cases you might want to use a timeout. For the analysis with option -obviously-terminates, the timeout is mandatory, because the analysis fails to terminate if the analysed program does not itself terminate.
According to these blog posts I wrote, you should be able to diagnose the following examples, not all of which are entirely trivial:
char x, y;
main()
{
x = input();
y = input();
while (x>0 && y>0)
{
if (input() == 1)
{
x = x - 1;
y = input();
}
else
y = y - 1;
}
}
Terminates
char x, y;
main()
{
x = input();
y = input();
while (x>0 && y>0)
{
// Frama_C_dump_each();
if (input())
{
x = x - 1;
y = x;
}
else
{
x = y - 2;
y = x + 1;
}
}
}
Terminates
char x;
main(){
x = input();
while (x > 0)
{
Frama_C_dump_each();
if (x > 11)
x = x - 12;
else
x = x + 1;
}
}
Terminates
unsigned char u;
int main(){
while (u * u != 17)
{
u = u + 1;
}
return u;
}
Does not terminate.
However, the option -obviously-terminates involved in these examples was not originally designed for this use (it was more of an optimisation for the analysis of a certain kind of program). I did not realise that in some rare cases, when this option was set, the analysis could terminate without the analysed program itself terminating. If you are willing to recompile from sources, this issue would be fixed by setting variable obviously_terminates to true (instead of false) in file state_set.ml. If you have reasons to think that this script is not the solution you are looking for, then don't bother: the issue seems rare, as I said. I only noticed it while trying to determinate whether programs terminated in a more competitive setting.
Related
I recently wrote a parser generator tool that takes a BNF grammar (as a string) and a set of actions (as a function pointer array) and output a parser (= a state automaton, allocated on the heap). I then use another function to use that parser on my input data and generates a abstract syntax tree.
In the initial parser generation, there is quite a lot of steps, and i was wondering if gcc or clang are able to optimize this, given constant inputs to the parser generation function (and never using the pointers values, only dereferencing them) ? Is is possible to run the function at compile time, and embed the result (aka, the allocated memory) in the executable ?
(obviously, that would be using link time optimization, since the compiler would need to be able to check that the whole function does indeed have the same result with the same parameters)
What you could do in this case is have code that generates code.
Have your initial parser generator as a separate piece of code that runs independently. The output of this code would be a header file containing a set of variable definitions initialized to the proper values. You then use this file in your main code.
As an example, suppose you have a program that needs to know the number of bits that are set in a given byte. You could do this manually whenever you need:
int count_bits(uint8_t b)
{
int count = 0;
while (b) {
count += b & 1;
b >>= 1;
}
return count;
}
Or you can generate the table in a separate program:
int main()
{
FILE *header = fopen("bitcount.h", "w");
if (!header) {
perror("fopen failed");
exit(1);
}
fprintf(header, "int bit_counts[256] = {\n");
int count;
unsigned v;
for (v=0,count=0; v<256; v++) {
uint8_t b = v;
while (b) {
count += b & 1;
b >>= 1;
}
fprintf(header, " %d,\n" count);
}
fprintf(header, "};\n");
fclose(header);
return 0;
}
This create a file called bitcount.h that looks like this:
int bit_counts[256] = {
0,
1,
1,
2,
...
7,
};
That you can include in your "real" code.
I am working on a fabric simulator at low level, I have done some work and at this point there are some points where I would appreciate help.
The input of the program is a DRF file. This is a format used in sewing machines to indicate the needles where to move.
The 2D representation is accurate, I parse the DRF info to a polyline, apply tensions and I extrude this in a openGL render.
Now I am trying to achieve the 3D ZAxis physics. I tried two methods:
1) Asuming information
First method is based on constrains about the process of manufacturing:
we only take care of the knots where a set of yarns interact and comupte z separation in this crucial points. The result is regular: good in a lot of cases but with a lot of botched jobs in order to avoid causistincs where this is not a good assumtion (for example, doing beziers, collisions in zones between this crucial points). We desist on this alternative when we saw there was a lot os causistic we would have to hardcode,probably creating aditional gliche.
2) Custom 2D Engine
The second attempt is to aprox the yarns with box colliders 2D, check collisions with a grid, and compute new Z in funcion of this. This is by far more expensive, but leads to a better results. Although, there is some problems:
The accuracy of box colliders over the yarn is not absolute ( there are ways to solve this but it would be great to read some alternatives)
There is not iterative process:
First we compute collisions pairwise, and add to each collider a list of colliding colliders. Then, we arrange this list and separate the Z axis in function of yarn's radius with center 0.The last step is to smooth the results from discrete z to a beziers or smoothing filters. This leads to another glitches.
If I extend this recomputation to all the collisions of the current collisions, I get weird results because z changes are bad propagated( maybe I am not doing good this point)
Some colliders are recomputed wrong ( first computed yarns have more posibilites to be altered for the last ones, leading on gliches)
This is the result for the secdond approach ( without recompute z's on smoothing step):
And some of the glicthes (mainly in the first yarns computed:
This is the collisions engine:
Details of bad aproximated curves:
At this point I have some questions:
Can I fix this glitches (or a great part at least)?
Glitches about bad aproximation and glitches for z recomputation
A 3D engine like ODE could do the job in a reasonable time?
If you need some specific code don't hestiate on ask for it.
EDIT: Ok, let's narrow the thing.
Yesterday I tried some open source engines without achieving good results.
500 collisions with joints are crashing the simulation. so I discard it.
My problem:
A: How I generate the yarns:
I have a set of points and I trace beziers between them.
CUBIC_BEZIER(p1x,p1y,p1z,p2x,p2y,p2z,p3x,p3y,p3z,p4x,p4y,p4z, npunts);
For each pair of points I add a collider:
p1.x = *(puntlinaux + k*NUMCOMP);
p1.y = *(puntlinaux + k*NUMCOMP + 1);
p2.x = *(puntlinaux + k*NUMCOMP + 4);
p2.y = *(puntlinaux + k*NUMCOMP + 5);
*bc = getCollider(&p1,&p2,x,y,z, radi_mm_fil , pbar->numbar);
where
BoxCollider2D getCollider(punt_st* p1, punt_st* p2, float *x, float *y, float *z, float radiFil,int numbar){
BoxCollider2D bc;
bc.numBar = numbar;
int i;
for(i = 0; i < MAX_COLLIDERS; i++) {
bc.collisions[i] = NULL;
}
bc.isColliding = 0;
bc.nCollisions = 0;
bc.nextCollider = NULL;
bc.lastCollider = NULL;
bc.isProcessed = 0;
punt_st center;
p1->x = p1->x;
p2->x = p2->x;
float distance = distancePunts_st(p1,p2);
bc.angle = atan2((p2->y - p1->y),(p2->x-p1->x));
//bc.angle = 0;
//DEBUG("getCollider\n");
//DEBUG("angle: %f, radiFil: %f\n", bc.angle*360/(2*3.141592), radiFil);
//DEBUG("Point: pre [%f,%f] post:[%f,%f]\n", p1->x, p1->y, p2->x, p2->y);
p1->y = p1->y;
p2->y = p2->y;
bc.r.min = *p1;
bc.r.max = *p2;
center = getCenterRect(bc.r);
bc.r.max.x = (center.x - distance/2);
bc.r.max.y = (center.y + radiFil) - 0.001f;
bc.r.min.x = (center.x + distance/2);
bc.r.min.y = (center.y - radiFil) + 0.001f;
bc.xIni= x;
bc.yIni= y;
bc.zIni= z;
return bc;
}
Then I add the collider to a grid to reduce complexity on comparisons
and check the collisions with Separated Axis Theorem
DEBUG("calc_sapo: checking collisions\n");
checkCollisions();
After that, I resolve the collisions leading on a discrete z aproximation.
DEBUG("calc_sapo: solving collisions\n");
resolveCollisions();
And then I apply a smooth function. This is the point where I am lost.
DEBUG("smoothing yarns\n");
smoothYarns();
To keep it simple, let's assume smoothYarns() does a simple mean value with the last and next z value:
for each collider of each yarn:
float aux = (*bc->zIni + *(bc-1)->zIni + *nextBC->zIni)/3;
float diff = aux - *bc->zIni;
*bc->zIni = aux;
Then we have to update all the colliders in contact with this recomputed collider:
void updateCollider(BoxCollider2D *bc, float diff){
int i ;
for(i = 0; i < bc->nCollisions; i++){
*bc->collisions[i]->zIni += diff;
}
}
This last step is messing up all the simulation because z's are accumulating theirselves ...
I want to know why this does not tend to converge as I expected and possible solutions for this problem.
EDIT2:
This is the algorithm that detects collisions:
For each collider in the grid position
Compare with others in the grid
if colliding: add the colliding collider to the current collider
For each yarn:
Go trhough all the colliders (linked list)
For each collider from a yarn:
Bubble sort collision yarns in function of their z order
Compute z based on yarns radius (centered on 0)
float computeZ(BoxCollider2D *array[], int nYarns, BoxCollider2D *bc){
int i, antBar;
float radiConflicte = 0;
float zMainYarn;
finger_st *pfin;
antBar = -1;
for(i = 0; i<nYarns; i++){
if(pfin != array[i]->pfin){
float radiFil = getRadiFromBC2D(*array[i]); // veure getCollider
radiConflicte += radiFil;
pfin = array[i]->pfin;
}
}
antBar = -1;
pfin = NULL;
for(i = 0; i<nYarns; i++){
float radiFil = getRadiFromBC2D(*array[i]); // veure getCollider
if(pfin != array[i]->pfin){
radiConflicte -= radiFil;
*(array[i]->zIni) = radiConflicte;
if(array[i] == bc) zMainYarn = *(array[i]->zIni);
radiConflicte -= radiFil;
pfin = array[i]->pfin;
}
else *(array[i]->zIni) = *(array[i-1]->zIni);
}
This leads in a approach where the last yarns processed can alter the firsts ones, and, in last instance, to glitches.
How I can avoid that?
Thanks a lot!
I had an earlier question about integrating Mathematica with functions written in C++.
This is a follow-up question:
If the computation takes too long I'd like to be able to abort it using Evaluation > Abort Evaluation. Which of the technologies suggested in the answers make it possible to have an interruptible C-based extension function? How can "interruptibility" be implemented on the C side?
I need to make my function interruptible in a way which will corrupt neither it, nor the Mathematica kernel (i.e. it should be possible to call the function again from Mathematica after it has been interrupted)
For MathLink - based functions, you will have to do two things (On Windows): use MLAbort to check for aborts, and call MLCallYieldFunction, to yield the processor temporarily. Both are described in the MathLink tutorial by Todd Gayley from way back, available here.
Using the bits from my previous answer, here is an example code to compute the prime numbers (in an inefficient manner, but this is what we need here for an illustration):
code =
"
#include <stdlib.h>
extern void primes(int n);
static void yield(){
MLCallYieldFunction(
MLYieldFunction(stdlink),
stdlink,
(MLYieldParameters)0 );
}
static void abort(){
MLPutFunction(stdlink,\" Abort \",0);
}
void primes(int n){
int i = 0, j=0,prime = 1, *d = (int *)malloc(n*sizeof(int)),ctr = 0;
if(!d) {
abort();
return;
}
for(i=2;!MLAbort && i<=n;i++){
j=2;
prime = 1;
while (!MLAbort && j*j <=i){
if(i % j == 0){
prime = 0;
break;
}
j++;
}
if(prime) d[ctr++] = i;
yield();
}
if(MLAbort){
abort();
goto R1;
}
MLPutFunction(stdlink,\"List\",ctr);
for(i=0; !MLAbort && i < ctr; i++ ){
MLPutInteger(stdlink,d[i]);
yield();
}
if(MLAbort) abort();
R1: free(d);
}
";
and the template:
template =
"
void primes P((int ));
:Begin:
:Function: primes
:Pattern: primes[n_Integer]
:Arguments: { n }
:ArgumentTypes: { Integer }
:ReturnType: Manual
:End:
";
Here is the code to create the program (taken from the previous answer, slightly modified):
Needs["CCompilerDriver`"];
fullCCode = makeMLinkCodeF[code];
projectDir = "C:\\Temp\\MLProject1";
If[! FileExistsQ[projectDir], CreateDirectory[projectDir]]
pname = "primes";
files = MapThread[
Export[FileNameJoin[{projectDir, pname <> #2}], #1,
"String"] &, {{fullCCode, template}, {".c", ".tm"}}];
Now, here we create it:
In[461]:= exe=CreateExecutable[files,pname];
Install[exe]
Out[462]= LinkObject["C:\Users\Archie\AppData\Roaming\Mathematica\SystemFiles\LibraryResources\
Windows-x86-64\primes.exe",161,10]
and use it:
In[464]:= primes[20]
Out[464]= {2,3,5,7,11,13,17,19}
In[465]:= primes[10000000]
Out[465]= $Aborted
In the latter case, I used Alt+"." to abort the computation. Note that this won't work correctly if you do not include a call to yield.
The general ideology is that you have to check for MLAbort and call MLCallYieldFunction for every expensive computation, such as large loops etc. Perhaps, doing that for inner loops like I did above is an overkill though. One thing you could try doing is to factor the boilerplate code away by using the C preprocessor (macros).
Without ever having tried it, it looks like the Expression Packet functionality might work in this way - if your C code goes back and asks mathematica for some more work to do periodically, then hopefully aborting execution on the mathematica side will tell the C code that there is no more work to do.
If you are using LibraryLink to link external C code to the Mathematica kernel, you can use the Library callback function AbortQ to check if an abort is in progress.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'd love some references, or tips, possibly an e-book or two. I'm not looking to write a compiler, just looking for a tutorial I could follow along and modify as I go. Thank you for being understanding!
BTW: It must be C.
Any more replies would be appreciated.
A great way to get started writing an interpreter is to write a simple machine simulator. Here's a simple language you can write an interpreter for:
The language has a stack and 6 instructions:
push <num> # push a number on to the stack
pop # pop off the first number on the stack
add # pop off the top 2 items on the stack and push their sum on to the stack. (remember you can add negative numbers, so you have subtraction covered too). You can also get multiplication my creating a loop using some of the other instructions with this one.
ifeq <address> # examine the top of the stack, if it's 0, continue, else, jump to <address> where <address> is a line number
jump <address> # jump to a line number
print # print the value at the top of the stack
dup # push a copy of what's at the top of the stack back onto the stack.
Once you've written a program that can take these instructions and execute them, you've essentially created a very simple stack based virtual machine. Since this is a very low level language, you won't need to understand what an AST is, how to parse a grammar into an AST, and translate it to machine code, etc. That's too complicated for a tutorial project. Start with this, and once you've created this little VM, you can start thinking about how you can translate some common constructs into this machine. e.g. you might want to think about how you might translate a C if/else statement or while loop into this language.
Edit:
From the comments below, it sounds like you need a bit more experience with C before you can tackle this task.
What I would suggest is to first learn about the following topics:
scanf, printf, putchar, getchar - basic C IO functions
struct - the basic data structure in C
malloc - how to allocate memory, and the difference between stack memory and heap memory
linked lists - and how to implement a stack, then perhaps a binary tree (you'll need to
understand structs and malloc first)
Then it'll be good to learn a bit more about the string.h library as well
- strcmp, strdup - a couple useful string functions that will be useful.
In short, C has a much higher learning curve compared to python, just because it's a lower level language and you have to manage your own memory, so it's good to learn a few basic things about C first before trying to write an interpreter, even if you already know how to write one in python.
The only difference between an interpreter and a compiler is that instead of generating code from the AST, you execute it in a VM instead. Once you understand this, almost any compiler book, even the Red Dragon Book (first edition, not second!), is enough.
I see this is a bit of a late reply, however since this thread showed up at second place in the result list when I did a search for writing an interpreter and no one have mentioned anything very concrete I will provide the following example:
Disclaimer: This is just some simple code I wrote in a hurry in order to have a foundation for the explanation below and are therefore not perfect, but it compiles and runs, and seems to give the expected answers.
Read the following C-code from bottom to top:
#include <stdio.h>
#include <stdlib.h>
double expression(void);
double vars[26]; // variables
char get(void) { char c = getchar(); return c; } // get one byte
char peek(void) { char c = getchar(); ungetc(c, stdin); return c; } // peek at next byte
double number(void) { double d; scanf("%lf", &d); return d; } // read one double
void expect(char c) { // expect char c from stream
char d = get();
if (c != d) {
fprintf(stderr, "Error: Expected %c but got %c.\n", c, d);
}
}
double factor(void) { // read a factor
double f;
char c = peek();
if (c == '(') { // an expression inside parantesis?
expect('(');
f = expression();
expect(')');
} else if (c >= 'A' && c <= 'Z') { // a variable ?
expect(c);
f = vars[c - 'A'];
} else { // or, a number?
f = number();
}
return f;
}
double term(void) { // read a term
double t = factor();
while (peek() == '*' || peek() == '/') { // * or / more factors
char c = get();
if (c == '*') {
t = t * factor();
} else {
t = t / factor();
}
}
return t;
}
double expression(void) { // read an expression
double e = term();
while (peek() == '+' || peek() == '-') { // + or - more terms
char c = get();
if (c == '+') {
e = e + term();
} else {
e = e - term();
}
}
return e;
}
double statement(void) { // read a statement
double ret;
char c = peek();
if (c >= 'A' && c <= 'Z') { // variable ?
expect(c);
if (peek() == '=') { // assignment ?
expect('=');
double val = expression();
vars[c - 'A'] = val;
ret = val;
} else {
ungetc(c, stdin);
ret = expression();
}
} else {
ret = expression();
}
expect('\n');
return ret;
}
int main(void) {
printf("> "); fflush(stdout);
for (;;) {
double v = statement();
printf(" = %lf\n> ", v); fflush(stdout);
}
return EXIT_SUCCESS;
}
This is an simple recursive descend parser for basic mathematical expressions supporting one letter variables. Running it and typing some statements yields the following results:
> (1+2)*3
= 9.000000
> A=1
= 1.000000
> B=2
= 2.000000
> C=3
= 3.000000
> (A+B)*C
= 9.000000
You can alter the get(), peek() and number() to read from a file or list of code lines. Also you should make a function to read identifiers (basically words). Then you expand the statement() function to be able to alter which line it runs next in order to do branching. Last you add the branch operations you want to the statement function, like
if "condition" then
"statements"
else
"statements"
endif.
while "condition" do
"statements"
endwhile
function fac(x)
if x = 0 then
return 1
else
return x*fac(x-1)
endif
endfunction
Obviously you can decide the syntax to be as you like. You need to think about ways of define functions and how to handle arguments/parameter variables, local variables and global variables. If preferable arrays and data structures. References∕pointers. Input/output?
In order to handle recursive function calls you probably need to use a stack.
In my opinion this would be easier to do all this with C++ and STL. Where for example one std::map could be used to hold local variables, and another map could be used for globals...
It is of course possible to write a compiler that build an abstract syntax tree out of the code. Then travels this tree in order to produce either machine code or some kind of byte code which executed on a virtual machine (like Java and .Net). This gives better performance than naively parse line by line and executing them, but in my opinion that is NOT writing an interpreter. That is writing both a compiler and its targeted virtual machine.
If someone wants to learn to write an interpreter, they should try making the most basic simple and practical working interpreter.
double sqrtIt(double x, double low_guess, double high_guess) {
int n = 10;
int num = 0;
while ( n > 0.000000000000001){
n = n / 10;
while (num < x && low_guess <= (low_guess * 10)){
low_guess = low_guess + n;
num = low_guess * low_guess;
}
}
return low_guess;
}
I've tried to use the code above to find the square root of a number. the function works fine most of the time, but when the number is 2, I get the "There is no source code available for the current location. Show disassembly" error from line num = low_guess * low_guess; I don't know what did wrong, and what does show disassembly do? Thanks
The "no source code available" message may indicate that you are not compiling in debug mode, so your IDE can't do a source-level debug. I think there's probably some confusion here, brought on by trying to deal with an IDE for the first time...
As others have said, you should probably declare n and num to be double, not int.
Probably, once you become more familiar with your development environment (as well as the language), some of these things will sort themselves out.
There are some problems with this function, but this seems to be a very strange error to get from that code. I think you made some other error somewhere else in your program, and that error is then causing this problem.
is there a typo in your code?
low_guess <= (low_guess * 10 ) is always true for non negative number...
Even if it worked it would be pretty inefficient. I guess you wanted to write something like this
double sqrtIt(double x)
{
double guess1, guess2;
guess1=1.0;
do
{
guess2=x/guess1;
guess1=(guess1+guess2)/2;
}
while (abs(guess1,guess2)<0.0000001);
return guess1;
}