How to store a stack or long array in database? - database

I am implementing a depth first tree traversal code a large tree. It's single traversal process can span several days because of the long processing time at each node and in between the system might crash or shutdown.
Therefore I want to make the whole process resumable if it the process stops in between for some reason. For that reason I am planning make the whole process backed by persistent datastore which essentially stores the state of the process.
As I figured out that for depth first traversal I will need a Stack type of data structure and which can be realized through a linked list type of array implementation. So my question is if there is some datastore which provides the ability to persist large array to maintain the order of the entities to represent a stack by it. Or if there is some other way through which I can maintain the state of my traversal in a persistent storage.
Thanks.

IMHO: You can implement a custom class of stack behavior using link list. This custom class should be serializable. Storing the state of object intermittently. So even when the system crashes you will loose some data and recreate the complete structure by de-serializing the object from persistent store.

Related

CPLEX generic callbacks, node LP for cut separation

I am setting up a branch-and-cut algorithm using the generic callback framework through the C API of CPLEX 12.10.
At each node, the separation problem is based on the current node LP and detects locally valid cuts, that if violated are added for every child node of the current node.
To my understanding, the information of a current node LP is not readily available in the generic callbacks. However, I would like to use cuts generated for a parent node, to generate better cuts in the child nodes.
Is it necessary to do book-keeping about which cuts are generated at all the nodes or can this information somehow be passed on using CPLEX functionality? If the only possibility is to keep track of all generated cuts, how can this book-keeping be made thread-safe, if CPLEX calls the callback from different threads and in different nodes?
There is no way to make CPLEX keep track of this information for you. You have to roll your own.
One way to do this is to implement a dictionary that maps a node's unique id (see CPXCALLBACKINFO_NODEUID) to the information you want to store along with that node. With respect to thread-safety you only have to protect the accesses to that dictionary. To do that, use a lock (pthread_mutex on non-Windows, CRITICAL_SECTION on Windows, for example) and lock and lookup or update operation on that dictionary.

Does a stack state in the visitor break the visitor pattern?

I need to process the AST of a language, and a visitor on the tree just solves it nice. however some features would require that I kept some kind of stack (the stack of known variables) in the visitors permanent context, that is extended and reduced as the visit progresses. Does it break the visitor pattern?
Visitors can accumulate information during their visits – in fact the Visit implementation is the addition state that might be required with complex operations (like when expression tree nodes are far away from each other and still need to be know of each other)…
So it is safe to say that you can store a state (even in the form of a stack) in the Visitor as long as you don’t store any kind of information on the processed/visited nodes themselves

use of generic list

struct node
{
void *data;
struct node *link;
};
Given such a structure we call it as generic linked list.what is the use of such a list in terms of its real time application use.
It's genericness allows you to create some (tested and reliable) library code around it, which can then be reused.
Of course it's not typesafe this way, that's why C++ introduced (among other things) generic template classes.
As for the use of a linked list per se: you use it where you want to store and retrieve a variable number of similar objects. Typically you don't know the number of the objects in advance and you are fine with getting them in the order you stored them. It's also quite efficient to delete an object from a linked list (once you have a pointer to its list entry).
You have a service which accepts requests from multiple applications and provides handle to each of them. The service can maintain the contexts per request in a linked list and when its done serving them delete the node from the list. A empty linked list in that case would mean no application has registered to the service.
For Eg, Consider the service built over SIP stack and multiple applications like IM, Presence information can register with the service which uses the SIP stack for signalling. Now the service maintains the data pertaining to each of the application in a linked list(well that is again a question of design but lets assume we have a limit to serve 5 applications). The SIP response has to be redirected to the application sending the request and say you hold the callback pointer as one value of the node it is simple to call it once you find the corresponding node for the response.
Each node saves lot of information about every application and uses it for sending back the response to the application.
Probably you may want to have a look at this.
what is the use of such a list in terms of its real time application use
If all you have is that definition and a pointer to the head of the list, then its only good for creating a arbitrary stack of objects. This is because to do anything except add or remove an object to the head of the list you have to iterate through it. Even with this limited "efficiency", such a list has its uses e.g. as a cache for unused heap objects that you are going to recycle to avoid mallocs.
If you also have a pointer to the tail of the list, you can add objects to either end in O(1) time. This means you can use it as a queue.
If each item has a pointer to its predecessor as well as successor, you can also insert/delete items from any point in the list in O(1) time. Of course, you still need to find the object which might involve a linear scan.

HCI: make the user wait through everything up front, or amortize?

I'm writing a silverlight app that queries a web service to populate a tree control. Each element will have at least 2 levels of children, so something like this:
a
+-b
+-c
d
+-g
+-h
e
+-i
+-j
f
+-k
+-l
The web service API is such that I can only get one level of child nodes at a time, so the first trip, I can get a,d,e,f. To get b,g,i,k, I have to make 4 trips. Similarly, I have to make 4 more trips to get c,h,j,l. (The service does actually allow me to get all the nodes in one trip, but it doesn't give me parent-child relationships along with it :-()
My question is this: should I make the user wait for a while up front while I get all the nodes for the tree view, or should I just get the top few nodes, and get the other nodes on-demand, or in a background task? Also, the nodes can change asynchronously, so if I get all the nodes up front, I'll need a "refresh" button for the treeview, and if I do it on demand, I'll have to have a caching strategy.
Which is best for the user?
A compromise where you load the first level up front and then load the remaining items in the background overridden by on-demand as required. If you load the nodes breadth first (e.g. a,d,e,f then b,g,i,k) rather than depth first (e.g. a,d,e,f followed by b,c) you can redirect your loading to be focused on the most recently expanded node.
Personally, as a user, I would prefer all the data to be loaded up front so that once the application finishes loading I can trust that I won't have to wait anymore (or at least very little)
But, I suppose it depends on several traits of your application / data:
How dynamic is the data? Does it update more often then the rate at which the user explores the nodes? If it does, then you will have to read the data as the user explores it, otherwise you can probably get away with only updating it occasionally and checking for the freshest data before performing important operations.
How much of the data will the user explore during normal use? If they are constantly exploring throughout the entire tree, then having the entire tree loaded is important. On the other hand, if most users will usually only expand a small portion of the tree, then maybe loading on demand is better so you don't waste thier time loading data they will never see anyway.
How much affect with this have on performance? Does it really take a long time to load all the data? If the data is not too much, maybe the whole thing can be loaded in a matter of seconds, in which case the amount of work to implement the optimization will not be significant to the end user and in turn will not have a good return on investment.
Most likely you don't have clear cut answers to these questions, but they're probably good to consider when you're attacking this interesting problem.
Short answer is to make the user wait for as little as possible. They will curse your name if they have to wait 10-20 seconds on application load, but not notice 0.1-0.2 seconds for a tree node to expand.
I have an app in production with a similar structure. I cannot load up-front because it'd be effectively loading the entire database. Here's my strategy:
The tree control starts with 1 level expanded below the root.
Each unexpanded node has a dummy child node in order to get the [+] expansion icon to show
When a node is expanded, it fires an event which is trapped by the app. If the only child node is the dummy one, the dummy is deleted and the children are loaded from the database.
Changes in the data are not reflected automatically by visible nodes, however the context menu for the tree has a Refresh item that can be used to refresh a node.
I have considered showing updates asynchronously, but have tended to avoid it because large amounts of data can be shown in the tree and I'm wary of DB load if I'm checking them all for changes.
The app is WinForms, written in C# using .NET 2.0.

db4o concerns

I'm interested in using db4o as my persistence mechanism in my Desktop application but I'm concerned about a couple things.
1st concern: Accidentally clipping very complex object graphs.
Say I have a tree with a height of 10 and I fetch the root, how does it handle me storing the root object again?
From my understanding, it doesn't fetch the entire tree it fetches the first 5 referenced layers.
So.. If I make a trivial change to the root and then store it, will it clip away the nodes further down the tree, in essence deleting them.
If not.. how does it handle this?
2nd concern: Extracting subgraphs in a larger object graph
Using my tree example from above... If the database contains 1 massive tree can I query for a single node within it? Since .store was called only once, does my database think it contains only 1 "record"?
Thank you.
You have to be very careful, because two things can happen: you can pull whole db into memory, or just partial graph (rest of objects will be null).
In db4o there's notion of Activator and Update depth, which can be configured upon dbv40 configuration, or when objects are fetched. Its the way you tell db40 how deep you want him to go when fetching referenced objects. Check db4o web site, there's documentation about it:
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Activation
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Update_Depth
DB4O's Transparent Activation should resolve most of the fears you've expressed here.

Resources