Why are relational sets important? - database

A friend is developing a website, and has to make a database using SQL. He asks why do you need "has-a" or "is-a" relationships since you can take the primary keys of a one entity set and place it in the other appropriate entity set (and vice-versa) to find the relations.
I could not answer the question because I was just taught that relational sets are just how database works.
Edit: I did not want to go into normalization. He made a point that the information is replicated in the relationship set.

Your question mixes two different levels of abstraction together, namely the conceptual level and the logical level.
At the conceptual level, one is interested in describing the information requirements on the proposed database. It's useful to do this without tilting the description towards one solution or another. One model that is useful for this purpose is the Entity-Relationship (ER) model. In this model, the subject matter is broken down into entities (subjects) and relationships among those entities. All data is seen as describing some aspect of one of the entites or one of the relationships.
"Is-a" and "has-a" relationships are relevant at this level of abstraction. At this level, relationships are identified, but not implemented.
After creating a conceptual model of the database, but before creating the database itself, it's useful to go through a logical design phase, resulting in a logical model of the database. If the database is to be relational, it's useful to make the logical model a relational one. The relational model is the next level of abstraction.
This is where primary keys and foreign keys come in. These keys implement the relationships that were identified at the conceptual stage. This is how the relational model implements relationships. At this stage, you get involved with design issues like junction tables, table composition, and normalization.
In addition to the conceptual level and the logical level, there are the physical level and the script level. But these are outside the scope of your question.
The two kinds of relationships are features of the problem to be solved. foreign key references to primary keys are features of the proposed solution.

Related

Aggregating all relations into one table SQL Server

I'm trying to design an enterprise level database architecture. In ERD level I have an Issue.
Many of my tables have relations which each other. there may be some developments in the future and my design should be flexible and also fast on gathering the results.
In recent days I have created a Parent Table which is named Node and all of my Functional Tables has an one-to-one relation with this table.
(Functional Tables are those who keep real life datas like Content, User, Folder, Role, .... and not those who related to applications life-cycle)
So before adding a record to each table, We must add a Node into the Node Table and take the new NodeId to add into secondary table.
Node table alone, has a Many-To-Many relation with itself. so I designed this table to keep whole of my relation concerns.
All of the other entities are like the User and are related to the Node table as shown above.
Problem is: Does this design makes my relational queries faster on NodeAssoc table or It's better to keep relations separately ?
You say:
There may be some developments in the future and my design should be flexible and also fast on gathering the results.
Flexibility and performance are two separate things. Which have different ways to approach them or solve them. When you are designing a database, you have to concider database principles. Normalization is very important to keep in mind. Relations one-to-one and many-to-many are by design not common. In your case you are mentioning one-to-one and many-to-many relations, on which I have my worries.
Advice one -> Denormalize (merge) one-to-one tables to one table.
This reduces the amount of joins.
Advice two -> Introduce a bridge table on many-to-many table,
because there could be multiple matches. Fixing multiple matches means
complex queries, which leads to performance drop.
Advice three -> Use proper indexes in order to improve the performance
Increasing of flexibility can be through using Database Views, which is a query. The structure of the database may change in the future, while modifieing the view can be very fast too.

Expressing Relationships between Entities Informally

What are some of the different ways/illustrations people use in order to convey relationships between entities. I'm looking to quickly express relationships between different entities in a sub-system, without getting too complicated; keeping it high level but conveying the relationships clearly. I've heard of and used ERDs in the past, but I'm looking for an alternate solution that:
Is clear/concise
Can be quickly mocked, in say, an email
Conveys the relationships between entities clearly

Many-to-many relationship ERD

Good day,
Real estate companies have several Buildings, each Building managed by one or more Managers, Managers have access to one or more Buildings. So, there is a many-to-many relationship between Managers and Buildings. It has to be a table such as Permissions to get rid of many-to-many relationship.
Please help me to figure it out, what is the best design for the database ?
I came up with a two candidate diagrams, which one is better? If neither of them are good, what should I change ?
http://i.stack.imgur.com/Z0l6h.png
http://i.stack.imgur.com/Dg5Sv.png
Sincerely
The second picture seems closest
I'd suggest moving the boxes around a little to show the hierarchy. Put Companies top and center, then on the next row, Managers on the left, Buildings on the right and Permissions between those two.
ER diagrams are used for two different purposes. One purpose is to illustrate the subject matter entities, and the relationships between them, as understood by subject matter experts. This is called a conceptual model of the data.
The other purpose is to illustrate a proposed database design, one where the relationships are not only expressed, but also implemented somehow. If the design is relational (which it usually is) many-to-many relationships are expressed by creating an intermediate table. This is called a physical model of the data (in some literature it's called a logical model). This is what you have done in your second diagram.
Your first diagram could be cleaned up a little by eliminating the box named "permissions", and putting a crows-foot at both ends of the line connecting Managers and Buildings.
Now to come back to your question: which one is "better"? It depends. sometimes, a conceptual diagram is better for discussing the subject matter with the ultimate stakeholders: non-technical managers who work with the data all the time, and might be called "subject matter experts".
A physical diagram is usually better when discussing the proposed design among data architects and programmers. It explains not only how the data works in concept, but also how the database is to be built. This kind of detail is glossed over by a conceptual model.
So you may end up with two diagrams, and use the appropriate one depending on your audience.

Does a Normalized Class Design Lead to a Normalized Database Design

Motivation: I am a solo developer so I take all kinds of roles during development from programming to UI design to database design, etc. (it is pretty exhausting). We all know that one person can't be good at everything (or many things) and another thing I know for sure is that databases should be well designed and normalized.
Assumption: Assume that I am a very good object-oriented developer and I design my domain models following best practices (e.g. SOLID) and patterns.
Question: If my domain model is very well designed and I used an ORM (e.g. Nhibernate, Entity Framework, etc.) to generate the database from that model, will the generated database be normalized?
Generated database will depend on your entities mapping. If you are asking about default mapping, which will be used, then generated database could be not normalized.
For example, if you have nice inheritance hierarchy by default Entity Framework will use table per hierarchy (TPH) mapping, which enables polymorphism by denormalizing SQL schema. TPH uses discriminator column to store all types from hierarchy in single table. TPH violates third normal form because discriminator column is not part of primary key, but some columns depend on discriminator value.

Referential Integrity and HBase

One of the first sample schemas you read about in the HBase FAQ is the Student-Course example for a many-many relationship. The schema has a Courses column in the Student table and a Students column in the Course table.
But I don't understand how in HBase you guarantee integrity between these two objects. If something were to crash between updating one table and before another, we'd have a problem.
I see there is a transaction facility, but what is the cost of using this on what might be every Put? Or are there other ways to think about the problem?
We hit the same issue.
I have developed a commercial plugin for hbase that handles transactions and the relationship issues that you mention. Specifically, we utilize DataNucleus for a JDO Compliant environment. Our plugin is listed on this page http://www.datanucleus.org/products/accessplatform_3_0/datastores.html or you can go directly to our small blog http://www.inciteretail.com/?page_id=236.
We utilize JTA for our transaction service. So in your case, we would handle the relationship issue and also any inserts for index tables (Hard to have an app without index lookup and sorting!).
Without an additional log you won't be able to guarantee integrity between these two objects. HBase only has atomic updates at the row level. You could probably use that property though to create a Tx log that could recover after a failure.
If you have to perform two INSERTs as a single unit of work, that means you have to use a transaction manager to preserve ACID properties. There's no other way to think about the problem that I know of.
The cost is less of a concern that referential integrity. Code it properly and don't worry about performance. Your code will be the first place to look for performance problems, not the transaction manager.
Logical relational models use two main varieties of relationships: one-to-many and
many-to-many. Relational databases model the former directly as foreign keys (whether
explicitly enforced by the database as constraints, or implicitly referenced by your
application as join columns in queries) and the latter as junction tables (additional
tables where each row represents one instance of a relationship between the two main
tables). There is no direct mapping of these in HBase, and often it comes down to de-
normalizing the data.
The first thing to note is that HBase, not having any built-in joins or constraints,
has little use for explicit relationships. You can just as easily place data that is one-to-
many in nature into HBase tables:. But
this is only a relationship in that some parts of the row in the former table happen to
correspond to parts of rowkeys in the latter table. HBase knows nothing of this rela-
tionship, so it’s up to your application to do things with it (if anything).

Resources