Where/when does the object impedence mismatch occur? - database

When people talk about object impedence mismatch, where does the mismatch happen? What can't a database interpret from an object model?
Thanks

Usually because Objects can inherit methods and properties from other Objects and in a relationship database there is no equivalent.
See the following for more information:
http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch

The basic difference is that databases / relational model are based on sets (rows) of data that are globally available, whereas object-oriented models are based on trees of encapsulated, or hidden (not globally available) data. The two approaches are philosophically at odds - one exposes all (clumped into tables, organized by traits), the other hides all (clumped into nodes, organized by things). To go from relational to object-oriented, global trait data in rows have to be split up and hidden in things. To go from object-oriented to relational, hidden thing data in objects have to be collected into rows and exposed. This can be a lot of work, and there are many different ways to approach it, depending on your situation.

Related

What does "database is coherent collection of data with inherent meaning" means in database?

I have picked up the book named "Fundamentals of Database Systems, 3rd Edition" by Elmasri and Navathe to get a basic understanding first. I have started reading it from the first chapter.
A database is a logically coherent collection of data with some
inherent meaning, representing some aspect of real world and which is
designed, built and populated with data for a specific purpose.
What does means above paragraph?
A database is seen as a particular perspective on data and its representation in a framework of well-defined structures and interdependencies.
Breaking the definition down into parts:
'collection of data':
What it's all about.
'with some inherent meaning':
Mostly tautological, it would not constitute data otherwise. It shows, however, that databases do not exist to elicit the meaning of data. They may aid in doing so, though.
'representing some aspect of real world':
Contestable, as databases may represent data over abstract domains like mathematics ( eg. a database of prime number twins ). Unless this also counts as 'real world', which would make this part tautological.
'logically coherent':
Data items are related in a non-arbitrary way that allows reasoning about them. Often this aspect also includes comprehensiveness (as an objective at least) for the purpose at hand.
'for a specific purpose':
The intended perspective on the data, which co-determines the nature of structures and relationships the database will be composed of.
In particular the choice of representations and abstractions applied (eg. which parts of available data are dropped) depend on the intended purpose.
'designed, built and populated with data':
Implies that databases comprise a model and use a technical base. It also implies that databases focus on the description of data.
The usefulness of such high-level descriptions is probably limited but may help to focus on some key issues wrt databases:
- Describing data \
- Structuring data > modelling data
- Relating data items to each other /
- Reasoning over data
- Databases are tools
My teacher wrote the answer for what is database in the slides, it says this
Database:
a collection of data
represents some aspect of the real world (database represent something in the real world)
logically coherent collection (not a random collection)
designed, built & populated for a specific purpose

Database storage design of large amounts of heterogeneous data

Here is something I've wondered for quite some time, and have not seen a real (good) solution for yet. It's a problem I imagine many games having, and that I can't easily think of how to solve (well). Ideas are welcome, but since this is not a concrete problem, don't bother asking for more details - just make them up! (and explain what you made up).
Ok, so, many games have the concept of (inventory) items, and often, there are hundreds of different kinds of items, all with often very varying data structures - some items are very simple ("a rock"), others can have insane complexity or data behind them ("a book", "a programmed computer chip", "a container with more items"), etc.
Now, programming something like that is easy - just have everything implement an interface, or maybe extend an abstract root item. Since objects in the programming world don't have to look the same on the inside as on the outside, there is really no issue with how much and what kind of private fields any type of item has.
But when it comes to database serialization (binary serialization is of course no problem), you are facing a dilemma: how would you represent that in, say, a typical SQL database ?
Some attempts at a solution that I have seen, none of which I find satisfying:
Binary serialization of the items, the database just holds an ID and a blob.
Pro's: takes like 10 seconds to implement.
Con's: Basically sacrifices every database feature, hard to maintain, near impossible to refactor.
A table per item type.
Pro's: Clean, flexible.
Con's: With a wide variety come hundreds of tables, and every search for an item has to query them all since SQL doesn't have the concept of table/type 'reference'.
One table with a lot of fields that aren't used by every item.
Pro's: takes like 10 seconds to implement, still searchable.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
A few tables with a few 'base profiles' for storage where similar items get thrown together and use the same fields for different data.
Pro's: I've got nothing.
Con's: Waste of space, performance, confusing from the database to tell what fields are in use.
What ideas do you have? Have you seen another design that works better or worse?
It depends if you need to sort, filter, count, or analyze those attribute.
If you use EAV, then you will screw yourself nicely. Try doing reports on an EAV schema.
The best option is to use Table Inheritance:
PRODUCT
id pk
type
att1
PRODUCT_X
id pk fk PRODUCT
att2
att3
PRODUCT_Y
id pk fk PRODUCT
att4
att 5
For attributes that you don't need to search/sort/analyze, then use a blob or xml
I have two alternatives for you:
One table for the base type and supplemental tables for each “class” of specialized types.
In this schema, properties common to all “objects” are stored in one table, so you have a unique record for every object in the game. For special types like books, containers, usable items, etc, you have another table for each unique set of properties or relationships those items need. Every special type will therefore be represented by two records: the base object record and the supplemental record in a particular special type table.
PROS: You can use column-based features of your database like custom domains, checks, and xml processing; you can have simpler triggers on certain types; your queries differ exactly at the point of diverging concerns.
CONS: You need two inserts for many objects.
Use a “kind” enum field and a JSONB-like field for the special type data.
This is kind of like your #1 or #3, except with some database help. Postgres added JSONB, giving you an improvement over the old EAV pattern. Other databases have a similar complex field type. In this strategy you roll your own mini schema that you stash in the JSONB field. The kind field declares what you expect to find in that JSONB field.
PROS: You can extract special type data in your queries; can add check constraints and have a simple schema to deal with; you can benefit from indexing even though your data is heterogenous; your queries and inserts are simple.
CONS: Your data types within JSONB-like fields are pretty limited and you have to roll your own validation.
Yes, it is a pain to design database formats like this. I'm designing a notification system and reached the same problem. My notification system is however less complex than yours - the data it holds is at most ids and usernames. My current solution is a mix of 1 and 3 - I serialize data that is different from every notification, and use a column for the 2 usernames (some may have 2 or 1). I shy away from method 2 because I hate that design, but it's probably just me.
However, if you can afford it, I would suggest thinking outside the realm of RDBMS - it sounds like Non-RDBMS (especially key/value storage ones) may be a better fit to store these data, especially if item 1 and item 2 differ from each item a lot.
I'm sure this has been asked here a million times before, but in addition to the options which you have discussed in your question, you can look at EAV schema which is very flexible, but which has its own sets of cons.
Another alternative is database systems which are not relational. There are object databases as well as various key/value stores and document databases.
Typically all these things break down to some extent when you need to query against the flexible attributes. This is kind of an intrinsic problem, however. Conceptually, what does it really mean to query things accurately which are unstructured?
First of all, do you actually need the concurrency, scalability and ACID transactions of a real database? Unless you are building a MMO, your game structures will likely fit in memory anyway, so you can search and otherwise manipulate them there directly. In a scenario like this, the "database" is just a store for serialized objects, and you can replace it with the file system.
If you conclude that you do (need a database), then the key is in figuring out what "atomicity" means from the perspective of the data management.
For example, if a game item has a bunch of attributes, but none of these attributes are manipulated individually at the database level (even though they could well be at the application level), then it can be considered as "atomic" from the data management perspective. OTOH, if the item needs to be searched on some of these attributes, then you'll need a good way to index them in the database, which typically means they'll have to be separate fields.
Once you have identified attributes that should be "visible" versus the attributes that should be "invisible" from the database perspective, serialize the latter to BLOBs (or whatever), then forget about them and concentrate on structuring the former.
That's where the fun starts and you'll probably need to use "all of the above" strategy for reasonable results.
BTW, some databases support "deep" indexes that can go into heterogeneous data structures. For example, take a look at Oracle's XMLIndex, though I doubt you'll use Oracle for a game.
You seem to be trying to solve this for a gaming context, so maybe you could consider a component-based approach.
I have to say that I personally haven't tried this yet, but I've been looking into it for a while and it seems to me something similar could be applied.
The idea would be that all the entities in your game would basically be a bag of components. These components can be Position, Energy or for your inventory case, Collectable, for example. Then, for this Collectable component you can add custom fields such as category, numItems, etc.
When you're going to render the inventory, you can simply query your entity system for items that have the Collectable component.
How can you save this into a DB? You can define the components independently in their own table and then for the entities (each in their own table as well) you would add a "Components" column which would hold an array of IDs referencing these components. These IDs would effectively be like foreign keys, though I'm aware that this is not exactly how you can model things in relational databases, but you get the idea.
Then, when you load the entities and their components at runtime, based on the component being loaded you can set the corresponding flag in their bag of components so that you know which components this entity has, and they'll then become queryable.
Here's an interesting read about component-based entity systems.

Grouping postgresql tables in schemas

I'm currently building an app that contains around 60 or so tables, some with meta information, some with actual data and a couple of views on top of them.
To keep things organized I'm prefixing all table name with meta_ or view_ respectively, but might it be worth it to just put them in different schemas inside the same database?
It this common practice?
Any reasons to not do this?
Can I create FK-constraints over different schemas?
PS: There seems to be no performance penalty judging from this answer: PostgreSQL: Performance penalty for joining two tables in separate schemas
You can use schemas in any way you like. No limitation in principal. They are especially useful if you need to GRANT certain rights to groups of objects, for instance to separate users inside a database. You can largely treat them like directories in a file system (the analogy has its limits, though).
I use v_ prefix for views and f_ prefix for functions. But that is basically just notational convenience to spot those objects quickly in a text search - if I hack the dump for instance. Make such prefixes short, you will have to type them for the rest of your database life. Multiple prefixes from various semantic layers could have to apply to a single object.
I would not mix functional prefixes (v_ or view_) with semantic prefixes (meta_) on the same level. Rather create a separate schema for meta-objects and use prefixes to denote object types throughout all your schemata.
Whichever system you chose, stay consistent! Or it will do more harm than good.

Filtering results from the data access layer in the business layer

I haven't been able to find an answer to my question so far, and I suppose I have to ask my first question some time. Here goes.
I have a Data Access Layer that's responsible for interacting with various data storage elements and returns POCOs or collections of POCOs when querying out things.
I have a Business Layer that sits on top of this and is responsible for implementing business rules on objects returned from the Data Access Layer.
For instance, I have a SQL Table of Dogs, my data access layer can return that list of dogs as a collection of Dog object. My business layer then would do things like filter out dogs below a certain age, or any other filtering or transformation that had to happen based on business rules.
My question is this. What's the best way to handle filtering objects based on related records? Let's say I want all the people who have Cats. Right now my data access layer can return all the cats, and all the people, but doesn't do any filtering for me.
I can implement the filtering via a different data access method (i.e. DAO.GetCatPeople()) but this could get complicated if I have a multitude of related properties or relationships to handle
I can return all from both sides and do the matching myself all in the business layer, which seems like a lot of extra work and not fully utilizing the sql server.
I can write a data filtration interface and if my data access layer changes this layer would have to change as well.
Is there some known best practices here I could be benefiting from?
The view I take is that there's two "reasons" why you'd access data: Data centric and Use Case centric.
Data Centric is stuff like CRUD and other common / obvious stuff that is a no brainer.
"Use Case" centric is where you define an interface and matching POCO's for a specific purpose. [It's possible I'm missing some common terminology here, but Use Case centric is what I mean]
I think both types are valid. For the use case driven ones it's going to be mostly driven by business focused use cases, but I can see edge cases where they could be more technically driven - I'd say that was ok as long as they didn't violate any business rules or pervert your domain model.
Should Cats and Dogs know about each other? If they exist within the same domain model, and have established relationships within that model - then yes of course you should be able to make queries like GetCatPeople().
As far as managing the complexity goes, rather than GetCatPeople() you could have a more generic method that took an attribute as a parameter: GetPeopleByAnimal(animal).

Conceptual data modeling: Is RDF the right tool? Other solutions?

I'm planning a system that combines various data sources and lets users do simple queries on these. A part of the system needs to act as an abstraction layer that knows all connected data sources: the user shouldn't [need to] know about the underlying data "providers". A data provider could be anything: a relational DBMS, a bug tracking system, ..., a weather station. They are hooked up to the query system through a common API that defines how to "offer" data. The type of queries a certain data provider understands is given by its "offer" (e.g. I know these entities, I can give you aggregates of type X for relationship Y, ...).
My concern right now is the unification of the data: the various data providers need to agree on a common vocabulary (e.g. the name of the entity "customer" could vary across different systems). Thus, defining a high level representation of the entities and their relationships is required.
So far I have the following requirements:
I need to be able to define objects and their properties/attributes. Further, arbitrary relations between these objects need to be represented: a verb that defines the nature of the relation (e.g. "knows"), the multiplicity (e.g. 1:n) and the direction/navigability of the relation.
It occurs to me that RDF is a viable option, but is it "the right tool" for this job?
What other solutions/frameworks do exist for semantic data modeling that have a machine readable representation and why are they better suited for this task?
I'm grateful for every opinion and pointer to helpful resources.
If you need cardinality restrictions on relations (for example "a Person knows 1:n Languages"), then RDF is not enough (see http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#richerschemas). You will need ontology languages (at least OWL-DL for cardinalities greater than 1: http://www.w3.org/TR/owl-guide/#owl_cardinality)
I'd also consider an XML database and xquery, and perhaps topic maps (which is quite similar to RDF, but less widely known).
There are also a broad range of less standardised tools to consider, things like couchdb (which uses json).
There's rarely a 'right tool', but RDF is a very strong contender given your requirement.

Resources