Is there a static invariant discovery tool for C programs? - c

I'm looking for a tool that can statically discover invariants in C programs. I checked out Daikon but it discovers invariants only dynamically.
Is there a tool available for what I'm looking for? Thanks!

See The SLAM project: debugging system software via static analysis. It claims to infer invariants statically, for just what you asked for, the C language. The author, Tom Ball, is widely known for stellar work in program analysis.

If you mean "invariant" in the widest sense, as the linked page to Daikon is using, then the work of many static analysis tools can be described as "discovering invariants", just perhaps not the expressive invariants you were looking for.
Frama-C's value analysis accumulates its results, the possible values of all variables, for each statement. At the end of the analysis, it can thus present non-relational information about the domain variation of each variable in the program, at each statement. In this screenshot, an invariant is that S is always 0, 1, 3 or 6 just before the selected instruction, for all executions of this deterministic program.
The two hidden parameters in your question are the shape of the invariants you are interested in, and the shape of the programs you want to find these invariants for. For instance, SLAM, mentioned in Ira's answer, was designed to work on device driver code, and to infer invariants that just contain the necessary information for verifying proper use of system APIs. Another tool, Astrée, is famous for doing a very good job at inferring just the right invariants to demonstrate runtime safety of flight control software.
The two degrees of freedom make for a very large design space. You won't find anything that works for all kinds of C programs and infers all the invariants you might be interested in, but if you refine your question for specific application domains and kinds of invariants, you will have better chances to find relevant answers.

Related

Do people actually do OO in C? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Member functions can be emulated in C by passing the this pointer explicitly. Virtual functions can be emulated by explicitly storing in every object a pointer to a global array of function pointers. Fine.
Now my question is, do people actually do this? I am wondering if it's worth teaching this technique, because I do not want to teach something to C freshmen that is practically never used in the real world.
(I need to fill the last day of a two-week introductory C course for people already familiar with OOP.)
Are there any relevant projects, libraries or frameworks that emulate OO in C in the manner described?
I've about twenty years experience in C. It was the first compiled language I learned and I've never needed to move on, so it's been C and only C, all the way. I write code constantly at work and at home. I have published a library of lock-free data structures. I think I'm a competent C programmer.
With regard to your question, OO consists of a number of concepts. One, for example, is instantiation, e.g a library with a new() and delete() and instances of a given entity (stack, list, etc). C supports this and it is, of course, a very functional and useful approach. I've used this approach for about fifteen years.
Many years ago I began experimenting with another OO concept, well supported in C++, inheritance. I wanted an entity which contained other entities. The problem then is exposing the API of the contained entites. You can do it, but the fact is, the C language does not naturally express such an concept and approach. It is not something I now use.
My advice is; a knife is a knife, a fork is a fork. You can use either as the other, but it doesn't work well. C does not naturally support some (important) OO concepts, such as inheritance. Don't try to make C do these things. If you want to do this, use C++.
Yes, they do.
Are there any relevant projects, libraries or frameworks that emulate OO in C in the manner described?
I wouldn't call it "emulating" just because there's no first-class language support. See GObject.
A lot of project uses the Object oriented paradigms in C codebase. For various reasons they don't use CPP directly. For system level or performance intensive projects, Other languages don't cut the deal. So its a battle between cpp and c.
Why people emulate OO in C instead of full blown CPP is topic of heated arguments. Linus torvalds once famously stated, CPP compilers are not trustworthy. He has little faith on CPP generated code.
Linux kernel is a good example of implementing OO design patterns in C. You can read about how Linux kernel did it in this lwn.net article series :
part1
part2
There is a extensive free document lying around in internet which covers a full range implementation OO design patterns in C.
ooc.pdf
You can find many other projects along the same road.
Examples:
pjsip
sofia
It may not be used in practice, but it is incredibly valuable to learn the concept of the equivalence between member functions and functions that take the object as the first parameter. Having this concept in the back of their head will help them in many problems they will encounter down the road.
Day in and day out I see people asking questions on Stack Overflow about why it doesn't work to point to pass a member function to something requiring function pointer, and things like that. They think that member functions are just some magical functions that are part of an object, and over-complicate the whole situation. If they had realized that member functions were equivalent to functions that took the object as the first parameter, then the problem they're having (that to call the method they would somehow need both the member function pointer as well as the object), as well as possible solutions (somehow pass the object in separately, or make some kind of closure that captures the object) becomes apparent. Apparently, too many people just pretend that OO is "magic" and don't understand this.
In functional programming, we often teach people how data structures and local variables and all that stuff could be written purely in terms of manipulation of functions. Not that this is practical -- it would probably be inefficient -- but this impresses upon them something about the power of functions. And it helps them to understand things in a different way. And maybe down the road if they write a compiler or something, these equivalences will come in handy.
Computer science is all about equivalences and reductions, and how to think about one problem in terms of another. We reduce SAT-3 to subset sum, not because that's actually how we would actually solve the SAT-3 problem, but because this teaches us that subset sum is NP-complete.
Every once in a while, I come across a piece of code written by someone else, where non-instance methods take a pointer to a structure as an argument, and I see a pattern and a light bulb goes off in my head, and I say, ah-ha, this can be re-factored into an instance method, because I know about this equivalence. So you see, knowing these equivalences also helps us to write better, simpler code.
Check out TI's "DSP Algorithm Standard" / xDAIS framework.
There's a generic C API that every conforming DSP algorithm implementation implements (sorry for the tautology). The need for all this "art" stems from several issues common in the DSP world:
relatively small RAMs
multiple data channels (often parallel/concurrent)
complex algorithm usage patterns
something else I forget
The standard and framework aim at making it easier for DSP engineers to use 3rd party DSP algorithms.
There's an interface to configure an algorithm instance and query its memory requirements (based on the configuration) and there are support functions that actually manage the memory.
Some memory areas, scratchpads, can be allocated temporarily and given to an algorithm instance when it's active and taken away from it when it's inactive and given to another instance, effectively shared.
There's also functionality (and APIs) to move instance memory buffers to defragment memory.
There's more, but I'd need to reread the docs to recall the details.
See IALG_*() and ALG_*() interface methods for example.
Also, there are tools to validate implementations of the generic APIs. 3rd parties can request official validation of them from TI.
Some relevant links: spru352g.pdf, spru360e.pdf.

Non-linear Least Squares Optimization Library for C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm looking for a library in C that will do optimization of an objective function (preferrably Levenberg-Marquardt algorithm) and will support box constraints, linear inequality constraints and non-linear inequality constraints.
I've tried several libraries already, but none of them do employ the necessary constraint types for my application:
GNU GSL (does not support constraints at all)
cMPFIT (only supports box constraints)
levmar (does not support non-linear constraints at all)
I am currently exploring NLopt, but I'm not sure if I can achieve a least-squares approach with any of the supplied algorithms.
I find it hard to believe that there's not a single library supporting the full range of constraints in this problem, so I guess I did a mistake somewhere while googling.
I recently discovered I can call Matlab functions from C. While that would solve the problem quite easily, I don't want to have to call Matlab functions from C. It's not fast in my experience.
Any help will be greatly appreciated.
Some time ago I was researching the state of C/C++ least squares fitting libraries. I noted down a few links, including the ones you gave and also:
ALGLIB/optimization -- Lev-Mar with boundary constraints.
WNLIB/wnnlp -- a constrained non-linear optimization package in C (general optimization, not least squares). Constraints are handled by adding a penalty function.
I haven't used any of the libraries yet, but NLopt seems the most promising for me. It would be great if it had specialized interface and algorithms for (weighted) least-squares fitting.
BTW, does your note about Matlab mean that it has Lev-Mar with non-linear constraints?
The approach I finally followed is the following:
I used NLopt for the optimization and the objective function was constructed to compute the squared error of the problem.
The algorithm that showed the most promising results was COBYLA (Local derivative-free optimization). It supports box constraints and non-linear constraints. The linear inequity constraints were introduced as non-linear constraints, which should be generally feasible.
Simple benchmarking shows that it does converge a little slower than a Lev-Mar approach, but speed is sacrificed due to the need for constraints.
MPFIT: A MINPACK-1 Least Squares Fitting Library in C
MPFIT uses the Levenberg-Marquardt technique to solve the least-squares problem. In its typical use, MPFIT will be used to fit a user-supplied function (the "model") to user-supplied data points (the "data") by adjusting a set of parameters. MPFIT is based upon MINPACK-1 (LMDIF.F) by More' and collaborators.
http://cow.physics.wisc.edu/~craigm/idl/cmpfit.html
OPTIF9 can be converted to C (from Fortran) and may already have been by somebody.
If what you mean by box constraints is that it supports upper and lower limits on parameter values, I believe there is a version that does that.
That is a tricky problem, because it means whenever a parameter gets to a boundary, it effectively reduces the degrees of freedom by 1.
It can get "stuck on a wall" when you didn't really want it to.
What we've found is that it's better to use an unconstrained minimizer and transform parameters, via something like a log or logit transform, so that in the search space they are unconstrained, but in the model space they are constrained.
As far as the other types of constraints, I don't know, although one option is, as part of your objective function, to make it get really bad when constraints are violated, so the optimizer avoids those areas.
I've found when I have a really flexible set of constraints, if I want a good trouble-free algorithm, I use Metropolis-Hastings.
Unless I'm wrong, if it generates a sample that violates constraints, you can simply discard the sample.
It takes longer, but it's simple and always works.

Automated tracing use of variables within source code

I'm working with a set of speech processing routines (written in C) meant to be compiled with the mex command on MATLAB. There is this C-function which I'm interested in accelerating using FPGA.
The hardware takes in specified input parameters through input ports, the rest of the inputs as constants to be hard coded, and passes a particular variable some where within the C-function, say foo, to the output port.
I am interested in tracing the computation graph (unsure if this is the right term to use) of foo. i.e. how foo relates to intermediate computed variables, which in turn eventually depends on input parameters and hard coded constants. This is to allow me to flatten the logic so they can be coded using a hardware description language, as well as remove irrevelant logic which does not affect the value of foo. The catch is that some intermediate variables are global, therefore tracing is a headache.
Is there an automated tool which analyzes a given set of C headers and source files and provide a means of tracing how a specified variable is altered, with some kind of dependency graph of all variables used?
I think what you are looking for is a tool to do value analysis.
Among the tools available to do this, I think Code Surfer is probably the best out there. Of course, it is also quite expensive but if you are a student, they do have an academic license program. On the open-source side, Frama-C can also do this in a more limited fashion and has a much, much steeper learning curve. But it is free and will get you where you want to go.

How can I implement cooperative lightweight threading with C on Mac OS X?

I'm trying to find a lightweight cooperative threading solution to try implementing an actor model.
As far as I know, the only solution is setcontext/getcontext,
but the functionality is deprecated(?) by Apple. I'm confused by why they did this; however, I'm finding replacement for this.
Pthreads are not an option because I need cooperative model instead of preemptive model to control context switching timing precisely/manually without expensive locking.
-- edit --
Reason of avoiding pthreads:
Because pthreads are not cooperative/deterministic and too expensive. I need actor model for game logic code, so thousand of execution context are required at minimal. Hardware threading requires MB of memory and expense to create/destruct. And parallelism is not important. In fact, I just need concurrent execution of many functions. This can be implemented with many divided functions and some kind of object model, but my goal is reducing those overheads.
If I know something wrong, please correct me. It'll be very appreciated.
The obvious 'lightweight' solution is to avoid complex nested calling except for limited situations where the execution time will be tightly bounded, then store an explicit state structure for each "thread" and implement the main program logic as a state machine that's easily suspendable/resumable at most points. Then you can simply swap out the pointer to the state structure for 'context switch'. Basically this technique amounts to keeping all of your important state variables, including what would conventionally be local variables, in the state structure.
Whether this is worthwhile probably depends on your reason for avoiding pthreads. If your reason is to be portable to non-POSIX systems, or if you really need deterministic program flow, then it may be worthwhile. But if you're just worried about performance overhead and memory synchronization issues, I think you should use pthreads and manage these issues. If you avoid unnecessary locking, use fine-grained locks, and minimize the amount of time locks are held, performance should not suffer.
Edit: Based on your further details posted in the comments on the main question, I think the solution I've proposed is the right one. Each actor should have their own context in which you store the state of the actor's action/thinking/etc. You would have a run_actor function which would take an actor context and a number of "ticks" to advance the actor's state by, and a run_all_actors function which would iterate over a list of active actors and call run_actor for each with the specified number of ticks.
Further, note that this solution still allows you to use real threads to take advantage of SMP/multicore machines. You simply divide the actors up between threads. You may need some degree of locking if one actor needs to examine another's context (e.g. for collision detection).
I was researching this question as well, and I ran across GNU Pth (not to be confused with Pthreads). See http://www.gnu.org/software/pth/
It aims to be a portable solution for cooperative threads. It does mention it is implemented via setcontext/getcontext if available (so it may not be on Mac OSX). Otherwise it says it uses longjmp/setjmp, but it's not clear to me how that works.
Hope this is helpful to anyone who searches for this question.
I have discovered the some of required functionalities from setcontext/getcontext are implemented in libunwind.
Unfortunately the library won't be compiled on Mac OS X because of deprecation of the setcontext/getcontext. Anyway Apple has implemented their own libunwind which is compatible with GNU's implementation at source level. The library is exist on Mac OS X 10.6, 10.7, and iOS. (I don't know exact version in case of iOS)
This library is not documented, but I could find the headers from these locations.
/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS5.0.sdk/usr/include/libunwind.h
/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator4.3.sdk/usr/include/libunwind.h
/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator5.0.sdk/usr/include/libunwind.h
/Developer/SDKs/MacOSX10.6.sdk/usr/include/libunwind.h
/Developer/SDKs/MacOSX10.7.sdk/usr/include/libunwind.h
There was a note in the header file that to go GNU libunwind site for documentation.
I'll bet on the library.

Why do safety requirements like to discourage use of AI?

Seems that requirements on safety do not seem to like systems that use AI for safety-related requirements (particularly where large potential risks of destruction/death are involved). Can anyone suggest why? I always thought that, provided you program your logic properly, the more intelligence you put in an algorithm, the more likely this algorithm is capable of preventing a dangerous situation. Are things different in practice?
Most AI algorithms are fuzzy -- typically learning as they go along. For items that are of critical safety importance what you want is deterministic. These algorithms are easier to prove correct, which is essential for many safety critical applications.
I would think that the reason is twofold.
First it is possible that the AI will make unpredictable decisions. Granted, they can be beneficial, but when talking about safety-concerns, you can't take risks like that, especially if people's lives are on the line.
The second is that the "reasoning" behind the decisions can't always be traced (sometimes there is a random element used for generating results with an AI) and when something goes wrong, not having the ability to determine "why" (in a very precise manner) becomes a liability.
In the end, it comes down to accountability and reliability.
The more complex a system is, the harder it is to test.
And the more crucial a system is, the more important it becomes to have 100% comprehensive tests.
Therefore for crucial systems people prefer to have sub-optimal features, that can be tested, and rely on human interaction for complex decision making.
From a safety standpoint, one often is concerned with guaranteed predictability/determinism of behavior and rapid response time. While it's possible to do either or both with AI-style programming techniques, as a system's control logic becomes more complex it's harder to provide convincing arguments about how the system will behave (convincing enough to satisfy an auditor).
I would guess that AI systems are generally considered more complex. Complexity is usually a bad thing, especially when it relates to "magic" which is how some people perceive AI systems.
That's not to say that the alternative is necessarily simpler (or better).
When we've done control systems coding, we've had to show trace tables for every single code path, and permutation of inputs. This was required to insure that we didn't put equipment into a dangerous state (for employees or infrastructure), and to "prove" that the programs did what they were supposed to do.
That'd be awfully tricky to do if the program were fuzzy and non-deterministic, as #tvanfosson indicated. I think you should accept that answer.
The key statement is "provided you program your logic properly". Well, how do you "provide" that? Experience shows that most programs are chock full of bugs.
The only way to guarantee that there are no bugs would be formal verification, but that is practically infeasible for all but the most primitively simple systems, and (worse) is usually done on specifications rather than code, so you still don't know of the code correctly implements your spec after you've proven the spec to be flawless.
I think that is because AI is very hard to understand and that becomes impossible to maintain.
Even if a AI program is considered fuzzy, or that it "learns" by the moment it is released, it is very well tested to all know cases(and it already learned from it) before its even finished. Most of the cases this "learning" will change some "thresholds" or weights in the program and after that, it is very hard to really understand and maintain that code, even for the creators.
This have been changing in the last 30 years by creating languages easier to understand for mathematicians, making it easier for them to test, and deliver new pseudo-code around the problem(like mat lab AI toolbox)
As there is no accepted definition of AI, the question shall be more specific.
My answer is on adaptive algorithms merely employing parameter estimation - a kind of learning - to improve the safety of the output information. Even this is not welcome in functional safety although it may seem that the behaviour of a proposed algorithm is not only deterministic (all computer programs are) but also easy to determine.
Be prepared for the assessor asking you to demonstrate test reports covering all combinations of input data and failure modes. Your algorithm being adaptive means it depends not only on current input values but on many or all of the earlier values. You know that a full test coverage is impossible within the age of the universe.
One way to score is showing that previously accepted simpler algorithms (state of the art) are not safe. This shall be easy if you know your problem space (if not, keep away from AI).
Another possibility may exist for your problem: a compelling monitoring function indicating whether the parameter is estimated accurately.
There are enough ways that ordinary algorithms, when shoddily designed and tested, can wind up killing people. If you haven't read about it, you should look up the case of Therac 25. This was a system where the behaviour was supposed to be completely deterministic, and things still went horribly, horribly wrong. Imagine if it were trying to reason "intelligently", too.
"Ordinary algorithms" for a complex problem space tend to be arkward. On the other hand, some "intelligent" algorithms have a simple structure. This is especially true for applications of Bayesian inference. You just have to know the likelihood function(s) for your data (plural applies if the data separates into statistically independent subsets).
Likelihood functions can be tested. If the test cannot cover the tails far enough to reach the required confidence level, just add more data, for example from another sensor. The structure of your algorithm will not change.
A drawback is/was the CPU performance required for Bayesian inference.
Besides, mentioning Therac 25 is not helpful, since no algorithm at all was involved, just multitasking spaghetti code. Citing the authors, "[the] accidents were fairly unique in having software coding errors involved -- most computer-related accidents have not involved coding errors but rather errors in the software requirements such as omissions and mishandled environmental conditions and system states."

Resources