Saving execution state -- something like protothreads but without using labels as values?

Saving execution state -- something like protothreads but without using labels as values? - c

I'm trying to find a way to serialize execution/stack state, in such a way that the state can be archived and restored at a later time where execution can be made to pick up where it left off. Sort of like continuations, but with the feature that the stack state should be able to be serialized to disk and rehydrated on subsequent runs. I'm working in C (and/or Objective-C, if that helps.)
Protothreads looked somewhat close to what I'm looking for, but uses the GCC labels-as-values extension to resume from stored state. It seems to me that this is not likely to be robust for serialization/deserialization of the state across different compilations (and probably not even across runs of the same binary in the presence of ASLR.) In the abstract, there are going to be versioning challenges to be sure, but it doesn't seem like protothreads even gets that far.
I guess the canonical approach here would be to convert the code to a state machine, and then serialize/deserialize the state of the SM, but I find it a lot easier to think in terms of code, and struggle to envision how one would go about translating large swaths of non-trivial code into a state machine (even assuming that said code already has no global state or non-serializable heap structures, etc.)
It seems likely that any viable approach is going to require you to make allowances for it in your code. I imagine you'd have to store all your stack state in a "managed" stack structure that mirrored the stack, etc. I suspect it won't come "free", but I guess I'm hoping someone has already invented this wheel.
Anyone know of tools/libraries that address this problem, or alternately some method for converting arbitrary (but conforming) code into a serializable state machine? I found ragel, which goes the other way (SM specification -> code), but nothing generalize going this way.
I have subsequently discovered that C#'s compiler converts yield-based IEnumerator implementations into state machines in private/anonymous inner classes, so I guess it is possible to automate this for arbitrary code. Anyone know of a generalized method of doing this? (i.e. assuming you could structure the code like a yielding iterator.)

Related

C design pattern performing a list of actions without blocking?

Embedded C. I have a list of things I want to do, procedurally, mostly READ and WRITE and MODIFY actions, acting on the results of the last statement. They can take up to 2 seconds each, I can’t block.
Each action can have states of COMPLETE and ERROR which has sub-states for reason the error occurred. Or on compete I’ll want to check or modify some data.
Each list of actions is a big switch and to re-enter I keep a list of which step I’m on, a success step++ and I come back in further down the list next time.
Pretty simple, but I’m finding that to not block I’m spending a ton of effort checking states and errors and edges constantly. Over and over.
I would say 80% of my code is just checks and moving the system along. There has to be a better way!
Are there any design patterns for async do thing and come back later for results in a way that efficiently handles some of the exception/edge/handling?
Edit: I know how to use callbacks but don’t really see that as “a solution” as I just need to get back to a different part of the same list for the next thing to do. Maybe it’s would be beneficial to know the backend to how async and await in other languages work?
Edit2: I do have an RTOS for other projects but this specific question, assume no threads/tasks, just bare metal superloop.

Your predicament is a perfect fit for state machines (really, probably UML statecharts). Each different request can each be handled in its own state machine, which handle events (such as COMPLETE or ERROR indications) in a non-blocking, run-to-completion manner. As the events come in, the request's state machine moves through its different states towards completion.
For embedded systems, I often use the QP event-driven framework for such cases. In fact, when I looked up this link, I noticed the very first paragraph uses the term "non-blocking". The framework provides much more than state machines with hierarchy (states within states), which is already very powerful.
The site also has some good information on approaches to your specific problem. I would suggest starting with the site's Key Concepts page.
To get you a taste of the content and its relevance to your predicament:
In spite of the fundamental event-driven nature, most embedded systems
are traditionally programmed in a sequential manner, where a program
hard-codes the expected sequence of events by waiting for the specific
events in various places in the execution path. This explicit waiting
for events is implemented either by busy-polling or blocking on a
time-delay, etc.
The sequential paradigm works well for sequential problems, where the
expected sequence of events can be hard-coded in the sequential code.
Trouble is that most real-life systems are not sequential, meaning
that the system must handle many equally valid event sequences. The
fundamental problem is that while a sequential program is waiting for
one kind of event (e.g., timeout event after a time delay) it is not
doing anything else and is not responsive to other events (e.g., a
button press).
For this and other reasons, experts in concurrent programming have
learned to be very careful with various blocking mechanisms of an
RTOS, because they often lead to programs that are unresponsive,
difficult to reason about, and unsafe. Instead, experts recommend [...] event-driven programming.
You can also do state machines yourself without using an event-driven framework like the QP, but you will end up re-inventing the wheel IMO.

C, design: removing global objects

I'm creating a small Avida-style life simulation. I started out with a very basic, everything-is-global 600-line program in a single file to test some ideas, and now I want to create a real design.
Among other things, I had a global configuration object that every other function got something out of. Now, I must localize the object and pass pointers around. Thing is, mostly everyone needs this object. I've thought of three possible solutions:
a) Keep the configuration object
global (simplest, though not really a
solution)
b) Store pointers everywhere they are
needed (easy enough, though a waste
of memory, since some small
plain-old-data structures would need
it).
c) Create factories for the POD types
that need access to options, and have
the factory perform all operations on
them.
Of my ideas, only (c) sounds logical, but I don't want to needlessly complicate the structure. What would you guys do?
I'm fine with new ideas, and will provide whatever information about the program you want to know.
Thanks in advance!

I have to agree with #Carl Norum: there is nothing wrong with the global config setup you have now. You say that everybody "got something out of" it. As you know, the problem with globals comes when everybody writes into them. In your case, the config info truly is needed globally so deserves to be global.
If you want to make it be a little more decoupled and protected -- a little less global-ish -- then why not add some read/write access routines.
See, storing pointers everywhere isn't going to really solve the problem: it will only add a layer of indirection that will merely disguise or camouflage what are, in reality, the global accesses that are making you nervous. And that extra layer of indirection will add juuuuust enough room for juuuuust a teeny-weeny little bug to creep in.
So, bottom line: if stuff is naturally global then make it global and don't worry about the usual widespread received wisdom that's mostly correct but might not be the right thing in your application. To always be bound by the rules/propaganda that CS teachers put out there is, imo, the perfect example of a foolish consistency.

Global variables are awesome. Spend your time actually getting something done instead of refactoring for no reason. Every company I have worked at uses them heavily.
Ask yourself if you're actually gaining anything by moving it to an object you're just passing around everywhere. Might as well save yourself the extra complexity..

Go for B, unless profiling proves it to be a problem. On most machines, the memory required to store a pointer is very, very trivial.

Building a NetHack bot: is Bayesian Analysis a good strategy?

A friend of mine is beginning to build a NetHack bot (a bot that plays the Roguelike game: NetHack). There is a very good working bot for the similar game Angband, but it works partially because of the ease in going back to the town and always being able to scum low levels to gain items.
In NetHack, the problem is much more difficult, because the game rewards ballsy experimentation and is built basically as 1,000 edge cases.
Recently I suggested using some kind of naive bayesian analysis, in very much the same way spam is created.
Basically the bot would at first build a corpus, by trying every possible action with every item or creature it finds and storing that information with, for instance, how close to a death, injury of negative effect it was. Over time it seems like you could generate a reasonably playable model.
Can anyone point us in the right direction of what a good start would be? Am I barking up the wrong tree or misunderstanding the idea of bayesian analysis?
Edit: My friend put up a github repo of his NetHack patch that allows python bindings. It's still in a pretty primitive state but if anyone's interested...

Although Bayesian analysis encompasses much more, the Naive Bayes algorithm well known from spam filters is based on one very fundamental assumption: all variables are essentially independent of each other. So for instance, in spam filtering each word is usually treated as a variable so this means assuming that if the email contains the word 'viagra', that knowledge does affect the probability that it will also contain the word 'medicine' (or 'foo' or 'spam' or anything else). The interesting thing is that this assumption is quite obviously false when it comes to natural language but still manages to produce reasonable results.
Now one way people sometimes get around the independence assumption is to define variables that are technically combinations of things (like searching for the token 'buy viagra'). That can work if you know specific cases to look for but in general, in a game environment, it means that you can't generally remember anything. So each time you have to move, perform an action, etc, its completely independent of anything else you've done so far. I would say for even the simplest games, this is a very inefficient way to go about learning the game.
I would suggest looking into using q-learning instead. Most of the examples you'll find are usually just simple games anyway (like learning to navigate a map while avoiding walls, traps, monsters, etc). Reinforcement learning is a type of online unsupervised learning that does really well in situations that can be modeled as an agent interacting with an environment, like a game (or robots). It does this trying to figure out what the optimal action is at each state in the environment (where each state can include as many variables as needed, much more than just 'where am i'). The trick then is maintain just enough state that helps the bot make good decisions without having a distinct point in your state 'space' for every possible combination of previous actions.
To put that in more concrete terms, if you were to build a chess bot you would probably have trouble if you tried to create a decision policy that made decisions based on all previous moves since the set of all possible combinations of chess moves grows really quickly. Even a simpler model of where every piece is on the board is still a very large state space so you have to find a way to simplify what you keep track of. But notice that you do get to keep track of some state so that your bot doesn't just keep trying to make a left term into a wall over and over again.
The wikipedia article is pretty jargon heavy but this tutorial does a much better job translating the concepts into real world examples.
The one catch is that you do need to be able to define rewards to provide as the positive 'reinforcement'. That is you need to be able to define the states that the bot is trying to get to, otherwise it will just continue forever.

There is precedent: the monstrous rog-o-matic program succeeded in playing rogue and even returned with the amulet of Yendor a few times. Unfortunately, rogue was only released an a binary, not source, so it has died (unless you can set up a 4.3BSD system on a MicroVAX), leaving rog-o-matic unable to play any of the clones. It just hangs cos they're not close enough emulations.
However, rog-o-matic is, I think, my favourite program of all time, not only because of what it achieved but because of the readability of the code and the comprehensible intelligence of its algorithms. It used "genetic inheritance": a new player would inherit a combination of preferences from a previous pair of successful players, with some random offset, then be pitted against the machine. More successful preferences would migrate up in the gene pool and less successful ones down.
The source can be hard to find these days, but searching "rogomatic" will set you on the path.

I doubt bayesian analysis will get you far because most of NetHack is highly contextual. There are very few actions which are always a bad idea; most are also life-savers in the "right" situation (an extreme example is eating a cockatrice: that's bad, unless you are starving and currently polymorphed into a stone-resistant monster, in which case eating the cockatrice is the right thing to do). Some of those "almost bad" actions are required to win the game (e.g. coming up the stairs on level 1, or deliberately falling in traps to reach Gehennom).
What you could try would be trying to do it at the "meta" level. Design the bot as choosing randomly among a variety of "elementary behaviors". Then try to measure how these bots fare. Then extract the combinations of behaviors which seem to promote survival; bayesian analysis could do that among a wide corpus of games along with their "success level". For instance, if there are behaviors "pick up daggers" and "avoid engaging monsters in melee", I would assume that analysis would show that those two behaviors fit well together: bots which pick daggers up without using them, and bots which try to throw missiles at monsters without gathering such missiles, will probably fare worse.
This somehow mimics what learning gamers often ask for in rec.games.roguelike.nethack. Most questions are similar to: "should I drink unknown potions to identify them ?" or "what level should be my character before going that deep in the dungeon ?". Answers to those questions heavily depend on what else the player is doing, and there is no good absolute answer.
A difficult point here is how to measure the success at survival. If you simply try to maximize the time spent before dying, then you will favor bots which never leave the first levels; those may live long but will never win the game. If you measure success by how deep the character goes before dying then the best bots will be archeologists (who start with a pick-axe) in a digging frenzy.

Apparently there are a good number of Nethack bots out there. Check out this listing:

In nethack unknown actions usually have a boolean effect -- either you gain or you loose. Bayesian networks base around "fuzzy logic" values -- an action may give a gain with a given probability. Hence, you don't need a bayesian network, just a list of "discovered effects" and wether they are good or bad.
No need to eat the Cockatrice again, is there?
All in all it depends how much "knowledge" you want to give the bot as starters. Do you want him to learn everything "the hard way", or will you feed him spoilers 'till he's stuffed?

How to make your embedded C code immune to requirement changes without adding too much overhead and complexity?

In many embedded applications there is a tradeoff between making the code very efficient or isolating the code from the specific system configuration to be immune to changing requirements.
What kinds of C constructs do you usually employ to achieve the best of both worlds (flexibility and reconfigurabilty without losing efficiency)?
If you have the time, please read on to see exactly what I am talking about.
When I was developing embedded SW for airbag controllers, we had the problem that we had to change some parts of the code every time the customer changed their mind regarding the specific requirements. For example, the combination of conditions and events that would trigger the airbag deployment changed every couple weeks during development. We hated to change that piece of code so often.
At that time, I attended the Embedded Systems Conference and heard a brilliant presentation by Stephen Mellor called "Coping with changing requirements". You can read the paper here (they make you sign-up but it's free).
The main idea of this was to implement the core behavior in your code but configure the specific details in the form of data. The data is something you can change easily and it can even be programmable in EEPROM or a different section of flash.
This idea sounded great to solve our problem. I shared this with my colleague and we immediately started reworking some of the SW modules.
When trying to use this idea in our coding, we encountered some difficulty in the actual implementation. Our code constructs got terribly heavy and complex for a constrained embedded system.
To illustrate this I will elaborate on the example I mentioned above. Instead of having a a bunch of if-statements to decide if the combination of inputs was in a state that required an airbag deployment, we changed to a big table of tables. Some of the conditions were not trivial, so we used a lot of function pointers to be able to call lots of little helper functions which somehow resolved some of the conditions. We had several levels of indirection and everything became hard to understand. To make a long story short, we ended up using a lot of memory, runtime and code complexity. Debugging the thing was not straightforward either. The boss made us change some things back because the modules were getting too heavy (and he was maybe right!).
PS: There is a similar question in SO but it looks like the focus is different. Adapting to meet changing business requirements?

As another point of view on changing requirements ... requirements go into building the code. So why not take a meta-approach to this:
Separate out parts of the program that are likely to change
Create a script that will glue parts of source together
This way you are maintaining compatible logic-building blocks in C ... and then sticking those compatible parts together at the end:
/* {conditions_for_airbag_placeholder} */
if( require_deployment)
trigger_gas_release()
Then maintain independent conditions:
/* VAG Condition */
if( poll_vag_collision_event() )
require_deployment=1
and another
/* Ford Conditions */
if( ford_interrupt( FRONT_NEARSIDE_COLLISION ))
require_deploymen=1
Your build script could look like:
BUILD airbag_deployment_logic.c WITH vag_events
TEST airbag_deployment_blob WITH vag_event_emitter
Thinking outloud really. This way you get a tight binary blob without reading in config.
This is sort of like using overlays http://en.wikipedia.org/wiki/Overlay_(programming) but doing it at compile-time.

Our system is subdivided into many components, with exposed configuration and test points. There is a configuration file that is read at start-up that actually helps us instantiate components, attach them to each other, and configure their behavior.
It's very OO-like, in C, with the occasional hack to implement something like inheritance.
In the defense/avionics world software upgrades are very strictly controlled, and you can't just upgrade SW to fix issues... however, for some bizarre reason you can update a configuration file without a major fight. So it's been darn useful for us to be able to specify a lot of our implementation in those configuration files.
There is no magic, just good separation of concerns when designing the system and a bit of foresight on the part of the developers.

What are you trying to save exactly? Effort of code re-work? The red tape of a software version release?
It's possible that changing the code is reasonably straight-forward, and quite possibly easier than changing data in tables. Moving your often-changing logic from code to data is only helpful if, for some reason, it's less effort to modify data rather than code. That might be true if the changes are better expressed in a data form (e.g. numeric parameters stored in EEPROM). Or it might be true if the customer's requests make it necessary to release a new version of software, and a new software version is a costly procedure to build (lots of paperwork, or perhaps OTP chips burned by the chip maker).
Modularity is very good principle for these sort of things. Sounds as though you're already doing it to some degree. It's good to aim to isolate the often-changing code to as small an area as possible, and try to keep the rest of the code ("helper" functions) separate (modular) and as stable as possible.

I don't make the code immune to requirements changes per se, but I always tag a section of code that implements a requirement by putting a unique string in a comment. With the requirements tags in place, I can easily search for that code when the requirement needs a change. This practice also satisfies a CMMI process.
For example, in the requirements document:
The following is a list of
requirements related to the RST:
[RST001] Juliet SHALL start the RST with 5 minute delay when the ignition
is turned OFF.
And in the code:
/* Delay for RST when ignition is turned off [RST001] */
#define IGN_OFF_RST_DELAY 5
...snip...
/* Start RST with designated delay [RST001] */
if (IS_ROMEO_ON())
{
rst_set_timer(IGN_OFF_RST_DELAY);
}

I suppose what you could do is to specify several valid behaviors based on a byte or word of data that you could fetch from EEPROM or an I/O port if necessary and then create generic code to handle all possible events described by those bytes.
For instance, if you had a byte that specified the requirements for releasing the airbag it could be something like:
Bit 0: Rear collision
Bit 1: Speed above 55mph (bonus points for generalizing the speed value!)
Bit 2: passenger in car
...
Etc
Then you pull in another byte that says what events happened and compare the two. If they're the same, execute your command, if not, don't.

For adapting to changing requirements I would concentrate on making the code modular and easy to change, e.g. by using macros or inline functions for parameters which are likely to change.
W.r.t. a configuration which can be changed independently from the code, I would hope that the parameters which are reconfigurable are specified in the requirements, too. Especially for safety-critical stuff like airbag controllers.

Hooking in a dynamic language can be a lifesaver, if you've got the memory and processor power for it.
Have the C talk to the hardware, and then pass up a known set of events to a language like Lua. Have the Lua script parse the event and callback to the appropriate C function(s).
Once you've got your C code running well, you won't have to touch it again unless the hardware changes. All of the business logic becomes part of the script, which in my opinion is a lot easier to create, modify and maintain.

Is it bad practice to "go deep" with your application of callbacks?

Weird question, but I'm not sure if it's anti-pattern or not.
Say I have a web app that will be rendering 1000 records to an html table.
The typical approach I've seen is to send a query down to the database, translate the records in some way to some abstract state (be it an array, or a object, etc) and place the translated records into a collection that is then iterated over in the view.
As the number of records grows, this approach uses up more and more memory.
Why not send along with the query a callback that performs an operation on each of the translated rows as they are read from the database? This would mean that you don't need to collect the data for further iteration in the view so the memory footprint shrinks, and you're not iterating over the data twice.
There must be something implicitly wrong with this approach, because I rarely see it used anywhere. What's wrong with this approach?
Thanks.

Actually, this is exactly how a well-developed application should behave.
There is nothing wrong with this approach, except that not all database interfaces allow you to do this easily.
If we talk about tabularizing 10 records for a yet another social network, there is no need to mess with callbacks if you can get an array of hashes or whatever with a single call that is already implemented for you.

There must be something implicitly wrong with this approach, because I rarely see it used anywhere.
I use it. Frequently. Even when i wouldn't use too much memory repeatedly copying the data, using a callback just seems cleaner. In languages with closures, it also lets you keep relevant code together while factoring out the messy DB stuff.

This is a "limited by your tools" class of problem: Most programming languages don't allow to say "Do something around this code". This was solved in recent years with the advent of closures. Think of a closure as a way to pass code into another method which is then executed in a context. For example, in GSQL, you can write:
def l = []
sql.execute ("select id from table where time > ?", time) { row ->
l << row[0]
}
This will open a connection to the database, create a statement and a result set and then run the l << it[0] for each row the DB returns. Note that the code runs inside of sql.execute() but it can access local variables (l) and variables defined in sql.execute() (row).
With this kind of code, you can even generate the result of a HTTP request on the fly without keeping much of the page in RAM at any time. In my case, I'd stream a 2MB document to the browser using only a few KB of RAM and the browser would then chew 83s to parse this.

This is roughly what the iterator pattern allows you to do. In many cases this breaks down on the interface between your application and the database. Technologies like LINQ even have solutions that can send back code to the database.

I've found it easier to use an interface resolver than deep callback where its hooked up through several classes. MS has a much fancier version than mine called Unity. This provides a much cleaner way of accessing classes that should not be tightly coupled
http://www.codeplex.com/unity

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight