Better way to store hierarchical data with known depth? - database

I have a (actually quite simple) data structure that has a tree-like adjacency. I am trying to find a good way to represent data for a film-industry based web-app which needs to store data about film projects. The data consists of:
project -> scene -> shot -> version - each adjacent to the previous in a "one-to-many" fashion.
Right now I am thinking about a simple adjacency list, but I am having trouble believing that it would be sufficiently efficient to quickly retrieve the name of the project, given just the version, as I'd have to cycle through the other tables to get it. The (simplified) layout would be like this:
simple adjacency layout
I was thinking about - instead of referencing only the direct parent - referencing all higher level parents (like this), knowing that the hierarchy has a fixed depth. That way, I could use these shortcuts to get my information with only one query. But is this bad data modeling? Are there any other ways to do it?

It's not good data modelling from the perspective of normalisation. If you realise that you put the wrong scene in for a project, you then have to move it and everything down the hierarchy.
But... does efficiency matter to you? How much data are you talking about? How fast do you need a response? I'd say go with what you've got and if you need it faster, have something that regularly extracts the data to a cache.

Try a method called Modified Preorder Tree Traversal: http://www.sitepoint.com/hierarchical-data-database/

Related

How to model nested lists with many items using Google Drive Realtime API?

I'd like to model ordered nested lists of uniform items (like what you would see in a standard tree widget) using the Google Drive realtime API. These trees could get quite large, ideally working well with many thousands of items.
One approach would be:
Item:
title: CollaborativeString
attributes: CollaborativeMap
children: CollaborativeList // recursivly hold other items
But I'm unsure if this is feesible when dealing with a large number of items.
An alternative might be to store all items tree order in a single CollaborativeList and add an additional "level" attribute. Then reconstruct the tree structure based on that level on the client. That would change from having to maintain thousands of CollaborativeLists to just a single big one. Probably lots of other alternatives that I don't know about.
Thanks for pointers on the best way to model this in the Google Drive Realtime API.
So long as the total size of the document is within the size limits, there shouldn't be a significant performance difference between the approaches from a framework perspective. (One caveat, using ObjectChangedListeners with a highly connected graph may slow things down. Prefer registering listeners on the specific objects instead.)
Modeling it as a real tree makes sense, since that will be the easiest to work with, and you can use the new move operation to atomically rearrange items in the lists.

Self Tracking Entities Traffic Optimization

I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.

Silverlight LINQtoSQL: one big dataclass, or several small ones?

I'm new to Silverlight, but being dumped right into the fray - good way to learn I suppose :o)
Anyway, the webapp I'm working on has a relatively complex database structure that represents various object types that are linked to each other, and I was wondering 2 things:
1- What is the recommended approach when it comes to dataclasses? Have just one big dataclass, or try and separate it into several smaller dataclasses, keeping in mind they will need to reference each other?
2- If the recommended approach is to have several dataclasses, how do you define the inter-dataclasses references?
I'm asking because I did a small test. In my DB (simplified here, real model is more complex but that's not important), I have a table "Orders" and a table "Parameters". "Orders" has a foreign key on "Parameters". What I did is create 2 dataclasses.
The first one, ParamClass, were I dropped the "Parameters" table only, so I can have a nice "parameter" class. I then created a simple service to add basic SELECT and INSERT functionality.
The second one, OrdersClass, where I dropped both tables, so that the relation between the tables would automatically create a "EntityRef<parameter>" variable inside the "order" class. I then removed the "parameters" class that was automatically created in the OrdersClass dataclass, since the class has already been declared in the ParamClass dataclass. Again I created a small service to test it.
So far so good, it builds happily. The problem is that when I try to handle things on the application code, I added service references for both dataclasses, but it is not happy doing something like:
OrdersServiceReference.order myOrder = new OrdersServiceReference.order();
myOrder.parameter = new ParamServiceReference.parameter(); //<-PROBLEM IS HERE
It comlpains that it cannot implicitly convert from type 'MytestDC.ParamServiceReference.parameter' to 'MytestDC.OrdersServiceReference.parameter'
Do I somehow need to declare some sort of reference to ParamClass from OrdersClass, or how do I "convert" one to the other?
Is this even a recommended and efficient way of doing this?
Since it's a team-project, I initially wanted to separate the dataclasses so that they (and their services) can be easily checked out by one member without checking out the whole entire dataclass.
Any help appreciated!
PS: using Silverlight 4, in case that's important
Based on the widely accepted Single Responsability Principle (SRP), a class should always be responsible for one task, and one task only.
That pretty much invalidates your "one big dataclass" approach.
I would always recommend smaller, more manageable bits that can be combined, instead of one humonguous class that does everything (except brew coffee for you).
Resources for the SRP:
Wikipedia on SRP
OODesign: Single Responsibility Principle
ObjectMentor: list of articles on good app design - which has a few links to PDF documents, like this one on SRP written by Robert C. Martin - the "guru" on proper OO design
OK, some more research let me to this: it is not simple to separate classes from a relational model using LINQtoSQL. I ended up switching to an Entity Framework approach, which itself doesn't deal with it gracefully (see here and there, for example), but at least it solved another major problem I had with LINQtoSQL.
There are other ORMs out there that are apparently much more capable at this (NHibernate comes up often in recommendations), unfortunately, I don't have time to investigate them now, being under such a tight deadline.
As for the referencing, it was quite simple, change the line to:
myOrder.parameter = new OrderServiceReference.parameter();
even though I removed the declaration from that dataclass.
Hope this helps someone!

Cheapest Way To Export/Import Array Contents To File - AS3/AIR

I'm working on a basic editor application. It uses an array of varying size that I want to store to disk. This will eventually be in an AIR application, but for now it's just an AS3 project in Flex.
I want to store the array in a file. The application edits the data, so it doesn't need to be human readable. I want it to be in whatever format will be quickest to store and load back into the array when I need that data again.
Any recommendations?
Edit: It strikes me that importing/exporting in such a way that it can be immediately cast as an Array() would probably be the cheapest thing rather than some sort of iterating - if that's possible. Another obvious option is getting the data as a simple comma delineated string and using the String.split() function to get an array. Though again, the question is what would be cheapest - and I'm not quite convinced that's it.
I'll also add that it needs to be in some sort of permanent file, so a shared object - while possibly the fastest, isn't really a long term solution.
I think the fastest and easiest way is to use a shared object. It stores native objects, so there is no serialization / deserialization steps involved. Just assign the value and read it back.
Performance wise, probably the fastest route as well. If you are looking for a large dataset and are sure it's an AIR app, you can use AIR's db, but that will definitely take much more work.
First, take a look at this answer.
As for saving the contents of an Array, consider JSON using the export tools provided by Adobe.

db4o concerns

I'm interested in using db4o as my persistence mechanism in my Desktop application but I'm concerned about a couple things.
1st concern: Accidentally clipping very complex object graphs.
Say I have a tree with a height of 10 and I fetch the root, how does it handle me storing the root object again?
From my understanding, it doesn't fetch the entire tree it fetches the first 5 referenced layers.
So.. If I make a trivial change to the root and then store it, will it clip away the nodes further down the tree, in essence deleting them.
If not.. how does it handle this?
2nd concern: Extracting subgraphs in a larger object graph
Using my tree example from above... If the database contains 1 massive tree can I query for a single node within it? Since .store was called only once, does my database think it contains only 1 "record"?
Thank you.
You have to be very careful, because two things can happen: you can pull whole db into memory, or just partial graph (rest of objects will be null).
In db4o there's notion of Activator and Update depth, which can be configured upon dbv40 configuration, or when objects are fetched. Its the way you tell db40 how deep you want him to go when fetching referenced objects. Check db4o web site, there's documentation about it:
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Activation
http://developer.db4o.com/Resources/view.aspx/Reference/Object_Lifecycle/Update_Depth
DB4O's Transparent Activation should resolve most of the fears you've expressed here.

Resources