Map database object to source implementation - database

What would be the best way to tie a database object to a source code implementation? Basically so that I could have a table of "ingredients" that could be referred to by objects from another table containing a "recipe", while still being able to index and search efficiently by their metadata. Also taking into account that some "ingredients" might inherit from other "ingredients".
Maybe I'm looking at this in a totally wrong way, would appreciate any light on the subject.

If I've correctly understood your goal, there should be these two choices:
Use an OR/M and don't try to implement the data mapping yourself from scratch.
Switch to a NoSQL storage. Analyze your data model and see if it's not very relational and it can be expressed using a document storage like MongoDB. For example, MongoDB already supports indexing.

Related

How do NoSQL (MongoDB) search for something quickly?

I'm not sure if this is the right place to ask this question (please move if it isn't).
I was wondering how NoSQLs like MongoDB search for items. As I understand it, NoSQL is merely a database that is NOT SQL (no actual structure). I'll use MongoDB since that's the only type I've had experience with dealing. In the case of MongoDB, collections (instead of tables) that store items in json format.
SQL has columns we can search and sort by. Using json format to store items though, would it not need to do a step like parse or json_decode to take out the item and compare thereby, slowing down the request?
Appreciate any info in advance.
Every item location of the NoSQL table is stored in HashMap against the hash of primary key. So, it retrieves the data very fast.
You are mistaken if you think, that the external representation as JSON implies the internal data format. (Just as the display as a "table" does not imply anything about the internal structures of a SQL database.)
Fortunately, mongodb is open source, so you can just look into https://github.com/mongodb/mongo

How to store user defined data structures?

I'm building a mobile application that records information about items and then outputs an automatically generated report.
Each Item may be of various types, each type requires different information to be recorded. The user needs to be able to specify what is to be stored for each type.
Is there a "best" way to store this type of information in a relational database?
My current plan is to have a Type table that maps Types to Attributes that need to be recorded for that Type. Does this sound sensible? I imagine that it may get messy when I come to produce reports from this data.
I guess I need a way of generalising the information that needs to be recorded?
I think I just need some pointers in the right direction.
Thanks!
Only a suggestion, might not be an answer... use JSON and go for no-sql database. Today it is more convenient to operate and play around with data in not strictly relational database format.
That way you can define a model(s), or create you own data structure as mentioned and store it easily as a collection of documents of that model. Also no-sql allows structure changes without obligating you to define entire "column" for all "rows" present there ;)
Check this out about MongoDB and NoSQL explanation.
This is also a beatiful post that i love about data modeling in
NoSQL.

MongoDB vs SQL Server for storing recursive trees of data

I'm currently specing out a project that stored threaded comment trees.
For those of you unfamiliar with what I'm talking about I'll explain, basically every comment has a parent comment, rather than just belonging to a thread. Currently, I'm working on a relational SQL Server model of storing this data, simply because it's what I'm used to. It looks like so:
Id int --PK
ThreadId int --FK
UserId int --FK
ParentCommentId int --FK (relates back to Id)
Comment nvarchar(max)
Time datetime
What I do is select all of the comments by ThreadId, then in code, recursively build out my object tree. I'm also doing a join to get things like the User's name.
It just seems to me that maybe a document storage like MongoDB which is NoSql would be a better choice for this sort of model. But I don't know anything about it.
What would be the pitfalls if I do choose MongoDB?
If I'm storing it as a Document in MongoDB, would I have to include the User's name on each comment to prevent myself from having to pull up each user record by key, since it's not "relational"?
Do you have to aggressively cache "related" data on the objects you need them on when you're using MongoDB?
EDIT: I did find this arcticle about storing trees of information in MongoDB. Given that one of my requirements is the ability to list to a logged in user a list of his recent comments, I'm now strongly leaning towards just using SQL Server, because I don't think I'll be able to do anything clever with MongoDB that will result in real performance benefits. But I could be wrong. I'm really hoping an expert (or two) on the matter will chime in with more information.
The main advantage of storing hierarchical data in Mongo (and other document databases) is the ability to store multiple copies of the data in ways that make queries more efficient for different use cases. In your case, it would be extremely fast to retrieve the whole thread if it were stored as a hierarchical nested document, but you'd probably also want to store each comment un-nested or possibly in an array under the user's record to satisfy your 2nd requirement. Because of the arbitrary nesting, I don't think that Mongo would be able to effectively index your hierarchy by user ID.
As with all NoSQL stores, you get more benefit by being able to scale out to lots of data nodes, allowing for many simultaneous readers and writers.
Hope that helps

Object oriented programming in Graph databases

Graph databases store data as nodes, properties and relations. If I need to retrieve some specific data from an object based upon a query, then I would need to retrieve multiple objects (as the query might have a lot of results).
Consider this simple scenario in object oriented programming in graph-databases:
I have a (graph) database of users, where each user is stored as an object. I need to retrieve a list of users living in a specific place (the place property is stored in the user object). So, how would I do it? I mean unnecessary data will be retrieved every time I need to do something (in this case, the entire user object might need to be retrieved). Isn't functional programming better in graph databases?
This example is just a simple analogy of the above stated question that came to my mind. Don't take it as a benchmark. So, the question remains, How great is object oriented programming in graph-databases?
A graph database is more than just vertices and edges. In most graph databases, such as neo4j, in addition to vertices having an id and edges having a label they have a list of properties. Typically in java based graph databases these properties are limited to java primatives -- everything else needs to be serialized to a string (e.g. dates). This mapping to vertex/edge properties can either be done by hand using methods such as getProperty and setProperty or you can something like Frames, an object mapper that uses the TinkerPop stack.
Each node has attributes that can be mapped to object fields. You can do that manually, or you can use spring-data to do the mapping.
Most graph databases have at least one kind of index for vertices/edges. InfiniteGraph, for instance, supports B-Trees, Lucene (for text) and a distributed, scaleable index type. If you don't have an index on the field that you're trying to use as a filter you'd need to traverse the graph and apply predicates yourself at each step. Hopefully, that would reduce the number of nodes to be traversed.
Blockquote I need to retrieve a list of users living in a specific place (the place property is stored in the user object).
There is a better way. Separate location from user. Instead of having a location as a property, create a node for locations.
So you can have (u:User)-[:LIVES_IN]->(l:Location) type of relationship.
it becomes easier to retrieve a list of users living in a specific place with a simple query:
match(u:User)-[:LIVES_IN]->(l:Location) where l.name = 'New York'.
return u,l.
This will return all users living in New York without having to scan all the properties of each node. It's a faster approach.
Why not use an object-oriented graph database?
InfiniteGraph is a graph database built on top of Objectivity/DB which is an massively scalable, distributed object-oriented database.
InfiniteGraph allows you to define your vertices and edges using a standard object-oriented approach, including inheritance. You can also embed a defined data type as an attribute in another data type definition.
Because InfiniteGraph is object-oriented, it give you access to query capabilities on complex data structures that are not available in the popular graph databases. Consider the following diagram:
In this diagram I create a query that determines the inclusion of the edge based on an evaluation of the set of CallDetail nodes hanging off the Call edge. I might only include the edge in my results if there exists a CallDetail with a particular date or if the sum of the callDurations of all of the CallDetails that occurred between two dates is over from threshold. This is the real power of object-oriented database in solving graph problems: You can support a much more complex data model.
I'm not sure why people have comingled the terms graph database and property graph. A property graph is but one way to implement a graph database, and not particular efficient. InfiniteGraph is a schema-based database and the schema provides several distinct advantages, one of which object placement.
Disclaimer: I am the Director of Field Operation for Objectivity, Inc., maker of InfiniteGraph.

Database design help with varying schemas

I work for a billing service that uses some complicated mainframe-based billing software for it's core services. We have all kinds of codes we set up that are used for tracking things: payment codes, provider codes, write-off codes, etc... Each type of code has a completely different set of data items that control what the code does and how it behaves.
I am tasked with building a new system for tracking changes made to these codes. We want to know who requested what code, who/when it was reviewed, approved, and implemented, and what the exact setup looked like for that code. The current process only tracks two of the different types of code. This project will add immediate support for a third, with the goal of also making it easy to add additional code types into the same process at a later date. My design conundrum is that each code type has a different set of data that needs to be configured with it, of varying complexity. So I have a few choices available:
I could give each code type it's own table(s) and build them independently. Considering we only have three codes I'm concerned about at the moment, this would be simplest. However, this concept has already failed or I wouldn't be building a new system in the first place. It's also weak in that the code involved in writing generic source code at the presentation level to display request data for any code type (even those not yet implemented) is not trivial.
Build a db schema capable of storing the data points associated with each code type: not only values, but what type they are and how they should be displayed (dropdown list from an enum of some kind). I have a decent db schema for this started, but it just feels wrong: overly complicated to query and maintain, and it ultimately requires a custom query to view full data in nice tabular for for each code type anyway.
Storing the data points for each code request as xml. This greatly simplifies the database design and will hopefully make it easier to build the interface: just set up a schema for each code type. Then have code that validates requests to their schema, transforms a schema into display widgets and maps an actual request item onto the display. What this item lacks is how to handle changes to the schema.
My questions are: how would you do it? Am I missing any big design options? Any other pros/cons to those choices?
My current inclination is to go with the xml option. Given the schema updates are expected but extremely infrequent (probably less than one per code type per 18 months), should I just build it to assume the schema never changes, but so that I can easily add support for a changing schema later? What would that look like in SQL Server 2000 (we're moving to SQL Server 2005, but that won't be ready until after this project is supposed to be completed)?
[Update]:
One reason I'm thinking xml is that some of the data will be complex: nested/conditional data, enumerated drop down lists, etc. But I really don't need to query any of it. So I was thinking it would be easier to define this data in xml schemas.
However, le dorfier's point about introducing a whole new technology hit very close to home. We currently use very little xml anywhere. That's slowly changing, but at the moment this would look a little out of place.
I'm also not entirely sure how to build an input form from a schema, and then merge a record that matches that schema into the form in an elegant way. It will be very common to only store a partially-completed record and so I don't want to build the form from the record itself. That's a topic for a different question, though.
Based on all the comments so far Xml is still the leading candidate. Separate tables may be as good or better, but I have the feeling that my manager would see that as not different or generic enough compared to what we're currently doing.
There is no simple, generic solution to a complex, meticulous problem. You can't have both simple storage and simple app logic at the same time. Either the database structure must be complex, or else your app must be complex as it interprets the data.
I outline five solution to this general problem in "product table, many kind of product, each product have many parameters."
For your situation, I would lean toward Concrete Table Inheritance or Serialized LOB (the XML solution).
The reason that XML might be a good solution is that:
You don't need to use SQL to pick out individual fields; you're always going to display the whole form.
Your XML can annotate fields for data type, user interface control, etc.
But of course you need to add code to parse and validate the XML. You should use an XML schema to help with this. In which case you're just replacing one technology for enforcing data organization (RDBMS) with another (XML schema).
You could also use an RDF solution instead of an RDBMS. In RDF, metadata is queriable and extensible, and you can model entities with "facts" about them. For example:
Payment code XYZ contains attribute TradeCredit (Net-30, Net-60, etc.)
Attribute TradeCredit is of type CalendarInterval
Type CalendarInterval is displayed as a drop-down
.. and so on
Re your comments: Yeah, I am wary of any solution that uses XML. To paraphrase Jamie Zawinski:
Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.
Another solution would be to invent a little Domain-Specific Language to describe your forms. Use that to generate the user-interface. Then use the database only to store the values for form data instances.
Why do you say "this concept has already failed or I wouldn't be building a new system in the first place"? Is it because you suspect there must be a scheme for handling them in common?
Else I'd say to continue the existing philosophy, and establish additional tables. At least it would be sharing an existing pattern and maintaining some consistency in that respect.
Do a web search on "generalized specialized relational modeling". You'll find articles on how to set up tables that store the attributes of each kind of code, and the attributes common to all codes.
If you’re interested in object modeling, just search on “generalized specialized object modeling”.

Resources