db4o concerns - database

I'm interested in using db4o as my persistence mechanism in my Desktop application but I'm concerned about a couple things.
1st concern: Accidentally clipping very complex object graphs.
Say I have a tree with a height of 10 and I fetch the root, how does it handle me storing the root object again?
From my understanding, it doesn't fetch the entire tree it fetches the first 5 referenced layers.
So.. If I make a trivial change to the root and then store it, will it clip away the nodes further down the tree, in essence deleting them.
If not.. how does it handle this?
2nd concern: Extracting subgraphs in a larger object graph
Using my tree example from above... If the database contains 1 massive tree can I query for a single node within it? Since .store was called only once, does my database think it contains only 1 "record"?
Thank you.

You have to be very careful, because two things can happen: you can pull whole db into memory, or just partial graph (rest of objects will be null).
In db4o there's notion of Activator and Update depth, which can be configured upon dbv40 configuration, or when objects are fetched. Its the way you tell db40 how deep you want him to go when fetching referenced objects. Check db4o web site, there's documentation about it:
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Activation
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Update_Depth

DB4O's Transparent Activation should resolve most of the fears you've expressed here.

Related

Better way to store hierarchical data with known depth?

I have a (actually quite simple) data structure that has a tree-like adjacency. I am trying to find a good way to represent data for a film-industry based web-app which needs to store data about film projects. The data consists of:
project -> scene -> shot -> version - each adjacent to the previous in a "one-to-many" fashion.
Right now I am thinking about a simple adjacency list, but I am having trouble believing that it would be sufficiently efficient to quickly retrieve the name of the project, given just the version, as I'd have to cycle through the other tables to get it. The (simplified) layout would be like this:
simple adjacency layout
I was thinking about - instead of referencing only the direct parent - referencing all higher level parents (like this), knowing that the hierarchy has a fixed depth. That way, I could use these shortcuts to get my information with only one query. But is this bad data modeling? Are there any other ways to do it?
It's not good data modelling from the perspective of normalisation. If you realise that you put the wrong scene in for a project, you then have to move it and everything down the hierarchy.
But... does efficiency matter to you? How much data are you talking about? How fast do you need a response? I'd say go with what you've got and if you need it faster, have something that regularly extracts the data to a cache.
Try a method called Modified Preorder Tree Traversal: http://www.sitepoint.com/hierarchical-data-database/

How to model nested lists with many items using Google Drive Realtime API?

I'd like to model ordered nested lists of uniform items (like what you would see in a standard tree widget) using the Google Drive realtime API. These trees could get quite large, ideally working well with many thousands of items.
One approach would be:
Item:
title: CollaborativeString
attributes: CollaborativeMap
children: CollaborativeList // recursivly hold other items
But I'm unsure if this is feesible when dealing with a large number of items.
An alternative might be to store all items tree order in a single CollaborativeList and add an additional "level" attribute. Then reconstruct the tree structure based on that level on the client. That would change from having to maintain thousands of CollaborativeLists to just a single big one. Probably lots of other alternatives that I don't know about.
Thanks for pointers on the best way to model this in the Google Drive Realtime API.
So long as the total size of the document is within the size limits, there shouldn't be a significant performance difference between the approaches from a framework perspective. (One caveat, using ObjectChangedListeners with a highly connected graph may slow things down. Prefer registering listeners on the specific objects instead.)
Modeling it as a real tree makes sense, since that will be the easiest to work with, and you can use the new move operation to atomically rearrange items in the lists.

Short lived DbContext in WPF application reasonable?

In his book on DbContext, #RowanMiller shows how to use the DbSet.Local property to avoid 1.) unnecessary roundtrips to the database and 2.) passing around collections (created with e.g. ToList()) in the application (page 24). I then tried to follow this approach. However, I noticed that from one using [} – block to another, the DbSet.Local property becomes empty:
ObservableCollection<Destination> destinationsList;
using (var context = new BAContext())
{
var query = from d in context.Destinations …;
query.Load();
destinationsList = context.Destinations.Local; //Nonzero here.
}
//Do stuff with destinationsList
using (var context = new BAContext())
{
//context.Destinations.Local zero here again;
//So no way of getting the in-memory data from the previous using- block here?
//Do I have to do another roundtrip to the database here to get the same data I wanted
//to cache locally???
}
Then, what is the point on page 24? How can I avoid the passing around of my collections if the DbSet.Local is only usable inside the using- block? Furthermore, how can I benefit from the change tracking if I use these short-lived context instances not handing over any cache data to each others under the hood? So, if the contexts should be short-lived for freeing resources such as connections, have I to give up the caching for this? I.e. I can’t use both at the same time (short-lived connections but long-lived cache)? So my only option would be to store the results returned by the query in my own variables, exactly what is discouraged in the motivation on page 24?
I am developing a WPF application which maybe will also become multi-tiered in the future, involving WCF. I know Julia has an example of this in her book, but I currently don’t have access to it. I found several others on the web, e.g. http://msdn.microsoft.com/en-us/magazine/cc700340.aspx (old ObjectContext, but good in explaining the inter-layer-collaborations). There, a long-lived context is used (although the disadvantages are mentioned, but no solution to these provided).
It’s not only that the single Destinations.Local gets lost, as you surely know all other entities fetched by the query are, too.
[Edit]:
After some more reading in Julia Lerman’s book, it seems to boil down to that EF does not have 2nd level caching per default; with some (considerable, I think) effort, however, ones can add 3rd party caching solutions, as is also described in the book and in various articles on MSDN, codeproject etc.
I would have appreciated if this problem had been mentioned in the section about DbSet.Local in the DbContext book that it is in fact a first level cache which is destroyed when the using {} block ends (just my proposal to make it more transparent to the readers). After first reading I had the impression DbSet.Local would always return the same reference (Singleton-style) also in the second using {} block despite the new DbContext instance.
But I am still unsure whether the 2nd level cache is the way to go for my WPF application (as Julia mentions the 2nd level cache in her article for distributed applications)? Or is the way to go to get my aggregate root instances (DDD, Eric Evans) of my domain model into memory by one or some queries in a using {} block, disposing the DbContext and only holding the references to the aggregate instances, this way avoiding a long-lived context? It would be great if you could help me with this decision.
http://msdn.microsoft.com/en-us/magazine/hh394143.aspx
http://www.codeproject.com/Articles/435142/Entity-Framework-Second-Level-Caching-with-DbConte
http://blog.3d-logic.com/2012/03/31/using-tracing-and-caching-provider-wrappers-with-codefirst/
The Local property provides a “local view of all Added, Unchanged, and Modified entities in this set”. Like all change tracking it is specific to the context you are currently using.
The DB Context is a workspace for loading data and preparing changes.
If two users were to add changes at the same time, they must not know of the others changes before they saved them. They may discard their prepared changes which suddenly would lead to problems for other other user as well.
A DB Context should be short lived indeed, but may be longer than super short when necessary. Also consider that you may not save resources by keeping it short lived if you do not load and discard data but only add changes you will save. But it is not only about resources but also about the DB state potentially changing while the DB Context is still active and has data loaded; which may be important to keep in mind for longer living contexts.
If you do not know yet all related changes you want to save into the database at once then I suggest you do not use the DB Context to store your changes in-memory but in a data structure in your code.
You can of course use entity objects for doing so without an active DB Context. This makes sense if you do not have another appropriate data class for it and do not want to create one, or decide preparing the changes in them make more sense. You can then use DbSet.Attach to attach the entities to a DB Context for saving the changes when you are ready.

Self Tracking Entities Traffic Optimization

I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.

HCI: make the user wait through everything up front, or amortize?

I'm writing a silverlight app that queries a web service to populate a tree control. Each element will have at least 2 levels of children, so something like this:
a
+-b
+-c
d
+-g
+-h
e
+-i
+-j
f
+-k
+-l
The web service API is such that I can only get one level of child nodes at a time, so the first trip, I can get a,d,e,f. To get b,g,i,k, I have to make 4 trips. Similarly, I have to make 4 more trips to get c,h,j,l. (The service does actually allow me to get all the nodes in one trip, but it doesn't give me parent-child relationships along with it :-()
My question is this: should I make the user wait for a while up front while I get all the nodes for the tree view, or should I just get the top few nodes, and get the other nodes on-demand, or in a background task? Also, the nodes can change asynchronously, so if I get all the nodes up front, I'll need a "refresh" button for the treeview, and if I do it on demand, I'll have to have a caching strategy.
Which is best for the user?
A compromise where you load the first level up front and then load the remaining items in the background overridden by on-demand as required. If you load the nodes breadth first (e.g. a,d,e,f then b,g,i,k) rather than depth first (e.g. a,d,e,f followed by b,c) you can redirect your loading to be focused on the most recently expanded node.
Personally, as a user, I would prefer all the data to be loaded up front so that once the application finishes loading I can trust that I won't have to wait anymore (or at least very little)
But, I suppose it depends on several traits of your application / data:
How dynamic is the data? Does it update more often then the rate at which the user explores the nodes? If it does, then you will have to read the data as the user explores it, otherwise you can probably get away with only updating it occasionally and checking for the freshest data before performing important operations.
How much of the data will the user explore during normal use? If they are constantly exploring throughout the entire tree, then having the entire tree loaded is important. On the other hand, if most users will usually only expand a small portion of the tree, then maybe loading on demand is better so you don't waste thier time loading data they will never see anyway.
How much affect with this have on performance? Does it really take a long time to load all the data? If the data is not too much, maybe the whole thing can be loaded in a matter of seconds, in which case the amount of work to implement the optimization will not be significant to the end user and in turn will not have a good return on investment.
Most likely you don't have clear cut answers to these questions, but they're probably good to consider when you're attacking this interesting problem.
Short answer is to make the user wait for as little as possible. They will curse your name if they have to wait 10-20 seconds on application load, but not notice 0.1-0.2 seconds for a tree node to expand.
I have an app in production with a similar structure. I cannot load up-front because it'd be effectively loading the entire database. Here's my strategy:
The tree control starts with 1 level expanded below the root.
Each unexpanded node has a dummy child node in order to get the [+] expansion icon to show
When a node is expanded, it fires an event which is trapped by the app. If the only child node is the dummy one, the dummy is deleted and the children are loaded from the database.
Changes in the data are not reflected automatically by visible nodes, however the context menu for the tree has a Refresh item that can be used to refresh a node.
I have considered showing updates asynchronously, but have tended to avoid it because large amounts of data can be shown in the tree and I'm wary of DB load if I'm checking them all for changes.
The app is WinForms, written in C# using .NET 2.0.

Resources