how to tell if specifications are modelled using database oriented approach or class design oriented approach - database

Given a problem specification, how to tell if it is a database design problem or class design(object oriented design) problem?

What comes to my mind, is that in OOP, classes(objects) contain methods, whereas a database is just a collection of relationships and values.
Therefore:
If you can say a problem is about how "things" in the specification relate to each other you have a database design problem.
If it is about what the "things" in the specification can do, you're going to be modeling more along object oriented programming.

If you're using a database and creating domain objects, it's both. Database design and class design are two different things, and both are necessary if you're using a database and classes. It's not like you choose one or the other.
This is where an ORM comes into play. When your data layer retrieves information from the database, a typical approach is to transform the relational data into your domain object(s) and pass that to the business logic layer so the rest of your application can deal with domain objects instead of a relational model.
Then your ORM does the opposite when persisting data: it takes a domain entity and turns it back into a relational structure that can be saved to the database.
Note: I'm assuming a relational database here. If not, substitute relational for whatever type of persistence layer you're using.

I believe that the only specifications which should be addressed as database-oriented problems are those which are focused on the manipulation of structured data types. If your specification is all about "store a customer record", "delete an order record", "change the value of price from 12 to 33 for record matching specifcation", you've got a database project.
I haven't seen that kind of problem specification since the Cobol team I worked in employed a systems ~~anarchist~~ analyst. Almost every project I've worked on since has had requirements that were not about how data was stored, but what the data meant.
If you get a requirement that says "Users may create Customers. Customers can place orders. Orders contain products. Orders can have delivery methods, payment methods, and status. Status follows a business process", you have an OO problem. You probably need a storage mechanism - and a database would be an excellent choice - but you have business logic that cannot be exclusively implemented by creating structured data types and relationships.

Related

What database to choose for the hierarchical content with relations?

I want to have a reviews-like website, but not only with reviews, other types of content as well. The design of the website combines both hierarchical structure (each content object/record/entity has a parent - kind of container), and relations - each content object/record/entity has a number of related other objects:
an author of the content (i.e. user)
related comments (with their own relations, particularly authors)
item being reviewed as a separate record in DB
images from the gallery
One of the most important things is performance. Relations used to be inefficient in the NoSQL, as I've read on the net and already tried out with other projects. On the other hand, the general design, apart from the relations mentioned, has an obvious content repository like structure, which is the exact reflection of hierarchical arrangement of objects (documents, articles, reviews) websites are designed. Also, I really like the loose structure of the records in NoSQL. Yet, I don't care about (nor use) things like versioning and other things related to NoSQL.
So I want to combine both wordls: hierarchical and relational within one project, or actually, its model. Apart from it, I want the project to be restful, so that a mobile apps could use the same content available through the API. Another requirement is that the content should be searchable.
What type of storage would you choose for a project like this?
I decided to go with the Graph DBs. Here's why I rejected the other ones:
I don't want to use NoSQL (Documents), since relations are hard to maintain and often require extra code infrastructure (often custom) to handle them, see e.g. Diaspora NoSQL problems
I don't want to use RDBMS, since the structure based DBs impose well known limitations and doesn't reflect the domain
I rejected the key-value and big table DBs as they have very specific use cases
Graph Databases have been used in number of content-oriented projects, and appeared to be doing the job surprisingly well.
You can easily model a hierarchical data structure in SQL with the following (using PostgreSQL):
CREATE TABLE comments (
id INTEGER,
parent INTEGER,
content VARCHAR(1024)
)
Where parent refers to the id of the parent comment.
If you are after a NoSQL database that exposes a RESTful interface, you could consider CouchDB.
You can then replicate CouchDB to Elasticsearch for more robust searching.
But if your data is relational then I would very much recommend you consider a SQL database like PostgreSQL first.

Difference between RDBMS and ORDBMS

It happened to me when I was reading about PostgreSQL on its wiki page where it refers to itself as an ORDBMS. I always knew about Microsoft SQL Server which is an RDBM system. Can someone help me understand the main differences between Relational Database Management System(RDBMS) and Object Relational Database Management System (ORDBMS) and in what scenarios should I use one of them?
Also, my key question is related to the fact that in the world of Microsoft SQL Server we often use a layer of Entity Framework (EF) to do the object relational mapping on the application side. So, in ORDBMS world are all the responsibilities of an ORM already fulfilled by the database itself in entirety or there could be use cases or scenarios where I would end up using an ORM like Entity Framework on top of ORDBMS as well? Do people really even use ORMs on top of an ORDBMS system?
Taken from http://www.aspfree.com/c/a/database/introduction-to-rdbms-oodbms-and-ordbms/:
RDBMS
The main elements of RDBMS are based on Ted Codd’s 13 rules for a relational system, the concept of relational integrity, and normalization. The three fundamentals of a relational database are that all information must be held in the form of a table, where all data are described using data values. The second fundamental is that each value found in the table columns does not repeat. The final fundamental is the use of Standard Query Language (SQL).
Benefits of RDBMS are that the system is simple, flexible, and productive. Because the tables are simple, data is easier to understand and communicate with others. RDBMS are flexible because users do not have to use predefined keys to input information. Also, RDBMS are more productive because SQL is easier to learn. This allows users to spend more time inputting instead of learning. More importantly, RDBMS’s biggest advantage is the ease with which users can create and access data and extend it if needed. After the original database is created, new data categories can be added without the existing application being changed.
There are limitations to the relational database management system. First, relational databases do not have enough storage area to handle data such as images, digital and audio/video. The system was originally created to handle the integration of media, traditional fielded data, and templates. Another limitation of the relational database is its inadequacy to operate with languages outside of SQL. After its original development, languages such as C++ and JavaScript were formed. However, relational databases do not work efficiently with these languages. A third limitation is the requirement that information must be in tables where relationships between entities are defined by values.
ORDMS
Object-Relational database (ORDBMS) is the third type of database common today. ORDBMS are systems that “attempt to extend relational database systems with the functionality necessary to support a broader class of applications and, in many ways, provide a bridge between the relational and object-oriented paradigms.”
ORDBMS was created to handle new types of data such as audio, video, and image files that relational databases were not equipped to handle. In addition, its development was the result of increased usage of object-oriented programming languages, and a large mismatch between these and the DBMS software.
One advantage of ORDBMS is that it allows organizations to continue using their existing systems, without having to make major changes. A second advantage is that it allows users and programmers to start using object-oriented systems in parallel.
There are challenges in implementing an ORDBMS. The first is storage and access methods. The second is query processing, and the third is query optimization.
Most database players don't support it, or don't support it exclusively. It is complex, and not used broadly. Even if "data" is OO in nature, the databases existed decades ago, and they cannot take ORDBMS (or OODBMS). Learning curve also imposes problems.
ORDBMS/OODBMS are like virtual registry view you see in Registry Editor. Contents are tree-styled objects. But internally they might be stored as flat/hierarchical or in relational manner. You really don't care - the APIs provide you the view of registry information.
Similarly, even if major players don't support (and won't support) OO nature of database, they may provide some extensions. Or, you may have to craft your own framework for OO data. A movie database, having actors and directors can be represented using relations (tables). Actors, directors, shooting-locations would also be classes/objects, and can easily be represented using tables, and referential integrity imposed by database/DB designer.
You, as a developer would make this relational nature of data to an object-oriented form having Movie as a class, referencing actors/directors (1:1 or 1:N). I am not aware how/what EE facilitates this, but it would be doing mapping this way only.
Object-Relational Databases
Object oriented technology on top of relational technology and in the relational context.
Objects are stored in tables of objects rather than in tables of rows.
Support of major object oriented features: complex types, inheritance, aggregation, methods
Advantage: Extension of a well-known technology
Disadvantages: Mixture of both technologies may result in difficult to understand schemas
Has Performance problems
Object-relational systems include features such as complex object extensibility, encapsulation, inheritance, and better interfaces to OO languages.
ORDBMSs allow developers to embed new classes of data objects into the relational data model abstraction (and on top of SQL).
Following diagram shows how data can be accessed.

How to handle different organization with single DB?

Background
building an online information system which user can access through any computer. I don't want to replicate DB and code for every university or organization.
I just want user to hit a domain like www.example.com sign in and use it.
For second user it will also hit the same domain www.example.com sign in and use it. but the data for them are different.
Scenario
suppose a university has 200 employees, 2nd university has 150 and so on.
Qusetion
Do i need to have separate employee table for each university or is it OK to have a single table with a column that has University ID?
I assume 2nd is best but Suppose i have 20 universities or organizations and a total of thousands of employees.
What is the best approach?
This same thing is for all table? This is just to give you an example.
Thanks
The approach will depend upon the data, usage, and client requirements/restrictions.
Use an integrated model, as suggested by duffymo. This may be appropriate if each organization is part of a larger whole (i.e. all colleges are part of a state college board) and security concerns about cross-query access are minimal2. This approach has a minimal amount of separation between each organization as the same schema1 and relations are "openly" shared. It leads to a very simple model initially, but it can become very complicated (with compound FKs and correct usage of such) if needing relations for organization-specific values because it adds another dimension of data.
Implement multi-tenancy. This can be achieved with implicit filters on the relations (perhaps hidden behinds views and store procedures), different schemas, or other database-specific support. Depending upon implementation this may or may not share schema or relations even though all data may reside in the same database. With implicit isolation, some complicated keys or relationships can be hidden/eliminated. Multi-tenancy isolation also generally makes it harder/impossible to cross-query.
Silo the databases entirely. Each customer or "organization" has a separate database. This implies separate relations and schema groups. I have found this approach to to be relatively simple with automated tooling, but it does require managing multiple database. Direct cross-querying is impossible, although "linked databases" can be used if there is a need.
Even though it's not "a single DB", in our case, we had the following restrictions 1) not allowed to ever share/expose data between organizations, and 2) each organization wanted their own local database. Thus, our product ended up using a silo approach. Make sure that the approach chosen meets customer requirements.
None of these approaches will have any issue with "thousands", "hundreds of thousands", or even "millions" of records as long as the indices and queries are correctly planned. However, switching from one to another can violate many assumed constraints and so the decision should be made earlier on.
1 In this response I am using "schema" to refer to the security grouping of database objects (e.g. tables, views) and not the database model itself. The actual database model used can be common/shared, as we do even when using separate databases.
2 An integrated approach is not necessarily insecure - but it doesn't inherently have some of the built-in isolation of other designs.
I would normalize it to have UNIVERSITY and EMPLOYEE tables, with a one-to-many relationship between them.
You'll have to take care to make sure that only people associated with a given university can see their data. Role based access will be important.
This is called a multi-tenant architecture. you should read this:
http://msdn.microsoft.com/en-us/library/aa479086.aspx
I would go with Tenant Per Schema, which means copying the structure across different schemas, however, as you should keep all your SQL DDL in source control, this is very easy to script.
It's easy to screw up and "leak" information between tenants if doing it all in the same table.

User defined data objects - what is the best data storage strategy?

I am building a system that allows front-end users to define their own business objects. Defining a business object involves creating data fields for that business object and then relating it to other business objects in the system - fairly straight forward stuff. My question is, what is the most efficient storage strategy?
The requirements are:
Must support business objects with potentially 100+ fields (of all common data types)
The system will eventually support hundreds of thousands of business object instances
Business objects sometimes display data and aggregates from their relationships with other business objects
Users must be able to search for business objects by their data fields (and fields from related business objects)
The two possible solutions I can envisage are:
Have a dynamic schema such that when a new business object type is created a new table is created for storing instances of that object. The object's fields become columns in the storage table.
Have a fixed schema where instance data fields are stored as rows in basically a big long table.
I can see pros and cons to both approaches:
the dynamic schema allows me to index search columns
the dynamic tables are potentially limited in width by the max column size
dynamic schemas rule out / cause issues with replication
the static schema means less or even no dynamic sql generation
my guess is the static schema may perform like a dog when it comes to searching across 100,000+ objects
So what is the best soution? Is there another approach I haven't thought of?
Edit: The requirement I have been given is to build a generic system capable of supporting front-end user defined business objects. There will of course be restrictions on how these objects can be constructed and related, but the requirement itself is not up for negotiation.
My client is a service provider and requires a degree of flexibility in servicing their own clients, hence the need to create business objects.
I think your problem matches very well to a graph database like Neo4j, as it's built for the requested kind of flexibility from the beginning. It stores data as nodes and relationships/edges, and both nodes and relationships can hold arbitrary properties (in a key/value fashion). One important difference to a RDBMS is that a graph database won't need to lookup the relationships in a big long table (like in your fixed schema solution), so there should be a significant performance gain there. You can find out about language bindings for Neo4j in the wiki and read what others say about it in this stackoverflow thread. Disclaimer: I'm part of the Neo4j team.
Without much understanding of your situation...
Instead of writing a general purpose one-size-fits-all business objects system (which is the holy grail for Oracle, Microsoft, SAS, etc.), why not do it the typical way, where the requirements are gathered, and a developer designs and implements the users' business objects in an effective manner?
If your users are typical, they will create a monster, which will end up running slow, and they will hate it. Most users will view the data as an Excel sheet, and not understand relationships like: parent/child. As a result there will be some crazy objects built, and impossible-to-solve reports. You'll be forced to create scripts to manually convert many old objects to better and properly defined ones, etc...
Your requirements sound a little bit like an associative database with a front end to compose and edit entities.
I agree with KM above, unless you have a very compelling reason not to, you would be better off using a traditional approach. There are a lot of development tools and practices that allow you to build a robust and scalable system. Otherwise you will have to implement much of this yourself.
I don't know the best way to do this, because it sounds like something that has already been implemented by others. If I were asked to implement this feature, I would recommend buying a wheel instead of reinventing it.
Perhaps there are reasons you have to invent your own? If so, then you should add those reasons to the requirements you listed.
If you absolutely must be this generic, I still recommend buying a system that has been architected for this requirement. Not just the storage requirements, which are the least of the problems your customer will have; but also: how do you keep the customer from screwing up totally when given this much freedom. Some of the commercial systems already meet this challenge without going out of business because of customers messing up.
If you still need to do this on your own, then I suggest that your requirements (or perhaps those of another vendor?) must include: allow the customer to get it right, and help keep the customer from getting it wrong. You'll need some sort of UI to allow the customer to define these business objects, and the UI should validate the model that the customer builds.
I recommend a UI that works at a conceptual level. As an example, see NORMA, a Visual Studio add-in for Object-Role Modeling (the "other" ORM). Consider it as a example only, if your end users cannot afford a Visual Studio Standard license. Otherwise, you'll find that it is extensible, already produces many types of artifact (from SQL in various dialects to code), and will validate the model to see that it makes sense. End users would also be able to enter sample data that they believe should be valid, and the system will validate the data against the model.
If your customers are producing sensible (if dynamic) business objects, then the question of storage will be much simpler.
Have you thought about an XML based solution? The requirements suggested to me "Build a system that allows users to dynamically generate an XML Schema and work with XML documents based on that schema." I don't know enough about storing and querying XML documents to comment on your original question.
Another possibility might be to leverage NHibernate's ability to generate database schemas. If you can dynamically generate business objects, then you can generate XML mappings or Fluent mappings and use that to generate a normalized database schema.
Every user that I have ever talked to has always wanted "everything" in their project. Part of the job of gathering requirements is to guide the user, not just write down everything they say.
Your only hope is to build several template objects, that they can add properties to, you could code your application to handle each type of these objects, but allow the user to still slightly modify each as necessary.
You need to inform the user upfront of the major flaws this type of design has. This will help you in the end, when it runs slow, or if they screw up and need help fixing something. I'd put this in writing.
How many possible objects would they really need? Perhaps you could set these up using your system first. I have developed several very customizable systems over the years and when the user is sitting at an empty screen, it is like a deer in the headlights.
In any event, good luck.

Database system that is not relational

What are the other types of database systems out there. I've recently came across couchDB that handles data in a non relational way. It got me thinking about what other models are other people is using.
So, I want to know what other types of data model is out there. (I'm not looking for any specifics, just want to look at how other people are handling data storage, my interest are purely academic)
The ones I already know are:
RDBMS (mysql,postgres etc..)
Document based approach (couchDB, lotus notes)
Key/value pair (BerkeleyDB)
db4o
Quote from the "about" page:
db4o is the open source object database that enables Java and .NET developers to store and retrieve any application object with only one line of code, eliminating the need to predefine or maintain a separate, rigid data model.
Older non-relational databases:
Network Database
Hierarchical Database
Both mostly went out of style when relational became feasible.
Column-oriented databases are also a bit of a different animal. Many of them do support standard relational database SQL though. These are generally used for data warehouse type applications.
Semantic Web is also a non-relational data storage paradigm. There are no relations, all metadata is stored in the same way as data, and every entity has potentially its own unique set of attributes. Open-source projects that implement RDF, a Semantic Web standard, include Jena and Sesame.
Isn't Amazon's SimpleDB non-relational?
db4o, as mentioned by Eric, is an Object-Oriented database management system (OODBMS).
There's object-based databases(Gemstore, for example). Google's Big-Table and Amason's Simple Storage I am not sure how you would categorize, but both are map-reduce based.
A non-relational document oriented database we have been looking at is Apache CouchDB.
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
Our interest was in providing a distributed access user preferences store that would be immune to shape changes to which we could serialize preference objects from Java and access those just as easily with Javascript from a XULRunner based client application.
I'd like to detail more on Bill Karwin's answer about semantic web and triplestores, since it's what I am working on at the moment, and I have something to say on it.
The idea behind a triplestore is to store a graph-based database, whose datamodel roots in RDF. With RDF, you describe nodes and associations among nodes (in other words, edges). Data is organized in triples :
start node ----relation----> end node
(in RDF speech: subject --predicate--> object). With this very simple data model, any data network can be represented by adding more and more triples, provided you give a meaning to nodes and relations.
RDF is very general, and it's a graph-based data model well suited for search criteria looking for all triples with a particular combination of subject, predicate, or object, in any combination. Eventually, through a query language called SPARQL, you can also perform more complex queries, an operation that boils down to a graph isomorphism search onto the graph, both in terms of topology and in terms of node-edge meaning (we'll see this in a moment). SPARQL allows you only SELECT (and similar) queries. No DELETE, no INSERT, no UPDATE. The information you query (e.g. specific nodes you are interested in) are mapped into a table, which is what you get as a result of your query.
Now, topology in itself does not mean a lot. For this, a Schema language has been invented. Actually, more than one, and calling them schema languages is, in some cases, very limitative. The most famous and used today are RDF-Schema, OWL (Lite and Full), and they predate from the obsolete DAML+OIL. The point of these languages is, boiling down stuff, to give a meaning to nodes (by granting them a type, also described as a triple) and to relationships (edges). Also, you can define the "range" and "domain" of these relationships, or said differently what type is the start node and what type is the end node: you can say for example, that the property "numberOfWheels" can be applied only to connect a node of type Vehicle to a non-zero integer value.
ns:MyFiat --rdf:type--> ns:Vehicle
ns:MyFiat --ns:numberOfWheels-> 4
Now, you can use these ontologies in two directions: validation and inference. Validation is not that fancy today, but I've seen instances of use. Inference is what is cool today, because it allows reasoning. Inference basically takes a RDF graph containing a set of triples, takes an ontology, mixes them into a triplestore database which contains an "inference engine" and like magic the inference engine invents triples according to your ontological description. Example: suppose you just store this information in the database
ns:MyFiat --ns:numberOfWheels--> 4
and nothing else. No type is specified about this node, but the inference engine will add automatically a triple saying that
ns:MyFiat --rdf:type--> ns:Vehicle
because you said in your ontology that only objects of type Vehicle can be described by a property numberOfWheels.
Conversely, you can use the inference engine to validate your data against the ontology so to refuse not compliant data (sort of like XML-Schema for XML). In this case, you will need both triples to have your data successfully accepted by the triplestore.
Additional characteristics of triplestores are Formulas and Context-aware storage. Formulas are statements (as usual, triples subject predicate object) that describe something hypothetical. I never used Formulas, so I won't go into more details of something I don't know. Context awareness are basically subgraphs: the problem with storing triples is that you don't have anything to say where these triples come from. Suppose you have two dealers that describe the same price of a component. One says that the price is 5.99 and the other 4.99. If you just store both triples into a database, now you don't know anything about who stated each information. There are two ways to solve this problem.
One is reification. Reification means that you store additional triples to describe another triple. It's wasteful, and makes life hell because you have to reify every and each triple you store. The alternative is context-awareness. Having a context-aware storage It's like being able to box a bunch of triples into a container with a label on it (the context identifier). You now can use this identifier as subject for additional statements, hence describing a bunch of triples in a single action.
4. Navigational. Includes Tree/Hierarchy and Graph/Network.
File systems, the semantic web, XML, Object databases, CODASYL, and many others all fit into this category.
Those 4 are pretty much it.
There is also what is referred to as an "inverted index" or "inverted list" database. Software AG's Adabas product would be an example. As with hierachical, these databases continue to be used in large corporate or university environments because of legacy considerations or due to a performance advantage in certain situations (typically high-end transactional applications).
There are BASE systems (Basically Available, Soft State, Eventually consistent) and they work well with simple data models holding vast volumes of data. Google's BigTable, Dojo's Persevere, Amazon's Dynamo, Facebook's Cassandra are some examples.
See LINK
The illuminate Correlation Database is a new revolutionary non-relational database. The Correlation Database Management Dystem (CDBMS) is data model independent and designed to efficiently handle unplanned, ad hoc queries in an analytical system environment. Unlike relational database management systems or column-oriented databases, a correlation database uses a value-based storage (VBS) architecture in which each unique data value is stored only once and an auto-generated indexing system maintains the context for all values (data is 100% indexed). Queries are performed using natural language instead of SQL (NoSQL).
Learn more at: www.datainnovationsgroup.com

Resources