How is Graph based database non relational? - database

I was going through definition of Graph based database and found that it has entities having relationship with each other. Now, I am a bit confused that why Graph based database falls into category of Non-relational database whereas it has relation among entities.
Thanks in advance.

I think your confusion is probably because you think "relational" means "relationships." It seems most developers these days believe this.
In SQL, we commonly call entities "tables" but in the original computer science that described relational databases, they were called "relations" after the term from mathematics. In short, a table has a heading, which has a finite number of named columns, and it has a set of rows, where each row has the same columns as the heading. This is a relation, and its analogy is to a table, not a relationship between tables.
The relational model includes an algebra of operations you can do on relations, and each operation yields a new relation. These include selection, projection, rename, join, and set operations like union/intersect/difference.
It also defines a set of criteria for modeling data in a set of relations, such that you avoid update anomalies, i.e. you won't have data disagree with other data in the same database. These are rules of normalization.
Graph databases don't necessarily represent relations, and they don't necessarily support relational operations in the same way.

Related

Databases - equivalent of ERD, UML and in relation data model

I have got this question:
What is the equivalent of table in ERD, UML and in relation data model?
In Chen's ER model (and Chen-notation ERDs), data is represented as attributes of and relationships between entities. This is an interpretation of relations in the relational model, which understands data as associations between domains of values/entities. Relations (i.e. attributes and relationships) can be represented as tables, though tables and relations don't map 1-to-1 - certain rules and semantics must be applied to tables (such as eliminating merged cells, ensuring that every cell contains exactly one value, column values are from a single domain, no duplicate rows, and order of rows/columns aren't significant) to understand them as relations.
In non-Chen ERDs (the kind in products like Visual Paradigm and MySQL Workbench), tables are directly represented but called entities, and foreign keys constraints are called relationships. This is reminiscent of the pre-relational network model. UML class diagrams fall in this category when used for data modeling.

What is the difference between an entity relationship model and a relational model?

I was only able to find the following two differences:
The relationships in an E-R model are explicitly defined, while they are implicit in a relational model.
Relational models require an intermediate table (often called a "junction table") to hold two foreign keys that implement the many-to-many relationship.
And why do we use the relational model, when we have an E-R diagram ?
You have it backwards.
The relationships in an E-R model are explicitly defined, while they
are implicit in a relational model.
No. Each Relational Model (RM) database base table and query result represents an application relationship. Entity-Relationship Modeling (E-RM) schemas are just a way of organizing (but under-using and under-specifying) (but with misunderstanding) relational tables and constraints.
Relational models require an intermediate table (often called a "junction table") to hold two foreign keys that implement the
many-to-many relationship.
No. It is Object-Relational Mapping (ORM) approaches that obscure their underlying straightforward relational application relationships, tables and constraints. The notion of "junction table" arose from ORM misunderstandings of confused presentations of the E-RM which itself misunderstands the RM.
As C J Date put it An Introduction to Database Systems, 8th ed:
a charitable reading of [Chen's original paper] would suggest that the E/R model is indeed a data model, but one that is essentially just a thin layer on top of the basic relational model [p 426]
It is a sad comment on the state of the IT field that simple solutions
are popular even when they are too simple. [p 427]
The Relational Model
Every relational table represents an application relationship.
-- employee EID has name NAME and ...
E(EID,NAME,...)
The mathematical term for such a thing, and also for a mathematical ordered-tuple set representing one, is a "relation". Hence the "Relational Model" (and "Entity-Relationship Modeling"). In mathematics relations are frequently described by parameterized statement templates for which one mathematical term is "characteristic predicate". The parameters of the predicate are columns of the table. In the RM a DBA gives a predicate for each base table and users put the rows that make a true statement from column values and the predicate into the table and leave the rows that make a false statement out.
/* now also employee 717 has name 'Smith' and ...
AND employee 202 has name 'Doodle' and ...
*/
INSERT INTO E VALUES (EID,NAME,...)
(717,'Smith',...),(202,'Doodle',...)
A query expression also has a predicate built from the relation operators and logic operators (in conditions) in it. Its value also holds the rows that make its predicate true and leaves out the ones that make it false.
/* rows where
FOR SOME E.*, M.*,
EID = E.EID AND ... AND MID = M.MID
AND employee E.EID has name E.NAME and ...
AND manager M.MID has
AND E.DEPT = M.DEPT AND E.NAME = 'Smith'
/*
SELECT E.*, M.MID
FROM E JOIN M ON E.DEPT = M.DEPT
WHERE E.NAME = 'Smith'
Present rows of tables making true statements and absent rows making false statements is how we record about the application situation in the database and how we interpret what the database is saying about the application situation. One can't use or interpret the database without having and understanding the predicates ie application relationships.
Entity-Relationship Modeling
E-RM (which does not really understand the RM) is essentially a(n unnecessary, restricted and restrictive) diagramming notation for describing (some parts of) (limited forms of) relational databases. Originally there were "entity (class)" icons/relations where the candidate key (CK) values were 1:1 with application entities plus other columns ("properties" of the "entity") and there were "relationship (class)" icons/tables which had foreign keys (FKs) to entity tables representing application relationships on multiple entities plus other things ("properties" of the "association"). An application relationship was represented by an icon with lines to the various entity icons that participated in it. (Ie the lines represented FKs. Which are not relationships but statements about constraints on tables.)
E-RM doesn't understand the relational model. It makes a pointless and misleading distinction between application entities and relationships. After all, every superkey (unique column set) of every base table or query result is in 1:1 correspondence with some application entity, not just the ones that have entity tables. Eg people can be associated by being married; but each such association is 1:1 with an entity called a marriage. This leads to inadequate normalization and constraints, hence redundancy and loss of integrity. Or when those steps are adequately done it leads to the E-R diagram not actually describing the application, which is actually described by the relational database predicates, tables and constraints. Then the E-R diagram is both vague, redundant and wrong.
Shorthand E-RM and ORMs
A lot of presentations and products claiming to be E-RM warp the E-RM, let alone the RM. They use the word "relationship" to mean a FK constraint. This arises as follows. When an E-RM relationship is binary it is a symbol with two lines to its FKs. So those three things can be replaced by one line between FKs. This kind of line represents that particular binary relationship and its FKs but now the E-R relationship is not explicit in the diagram although the E-R relationship is explicit in the longhand version and it is reflected by a table in what the diagrams are pictures of, namely the relational database they are describing. This gets called a "junction table". And people talk about that line/table being/representing "an X:Y relationship" between entities and/or associations without actually ever noticing that it's a particular application relationship. And there can be many such application relationships between the same two entities and/or associations.
ORMs do this too but also replace n-ary associations by just their FKs so that the associated application relationship and table are further obscured. Active Records goes even further by defining several shorthand relationships and their tables at once, equivalent to a chain of FK lines and association icons in the longhand E-RM diagram. This is exacerbated by many modeling techniques, including versions of E-RM and ORMs, also thinking that application relationships can only be binary. Again, this arose historically from lack of understanding of the RM.
They are two different things per se. A relational model represents information as tuples, directly mapped to a relational schema. The guidelines stem from relational algebra.
Meanwhile, an ER diagram models the relationships between the users and their underlying data in a system using entities. An ER diagram can be mapped to a relational model, and finally to a working schema.

What is the purpose of data modeling cardinality?

I understand what cardinality is, so please don't explain that ;-)
I would like to know, what the purpose of doing cardinality is in data modeling, and why i should care.
Example: In an ER model you make relations and ad the cardinality to the relations.
When am i going to use the cardinality further in the development process? Why should i care about the cardinality?
How, when and where do i use the cardinalities after i finish an ER model for example.
Thanks :-)
Cardinalities tell you something important about table design. A 1:m relationship requires a foreign key column in the child table pointing back to the parent primary key column. A many-to-many relationship means a JOIN table with foreign keys pointing back to the two participants.
How, when and where do i use the cardinalities after i finish an ER model for example.
When physically creating the database, the direction, NULL-ability and number of FKs depends on the cardinalities on both endpoints of the relationship in the ER diagram. It may even "add" or "remove" some tables and keys.
For example:
A "1:N" relationship is represented as a NOT NULL FK from the "N" table to "1" table. You cannot do it in the opposite direction and retain the same meaning.
A "0..1:N" relationship is represented as a NULL-able FK from "N" to "0..1" table.
A "1:1" relationship is represented by two NOT NULL FKs (that are also keys) forming a circular reference1 or by merging two entities into a single physical table.
A "0..1:1" relationship is represented by two FKs, one of which is NULL-able (also under keys).
A "0..1:0..1" relationship is represented by two FKs, both NULL-able and under keys, or by a junction table with specially crafted keys.
An "M:N" relationship requires an additional (so called "junction" or "link") table. A key of that table is a combination of migrated keys from child tables.
Not all cardinalities can be (easily) represented declaratively in the physical database, but fortunately those that can tend to be most useful...
1 Which presents a chicken-and-egg problem when inserting new data, which is typically resolved by deferring constraint checking to the end of the transaction.
Cardinality is a vital piece of information of a relation between two entites. You need them for later models when the actual table architecture is being modelled. Without knowing the relationship cardinality, one cannot model the tables and key restriction between them.
For example, a car must have exactly 4 wheels and those wheels must be attached to exactly one car. Without cardinality, you could have a car with 3, 1, 0, 12, etc... wheels, which moreover could be shared among other cars. Of course, depending on the context, this can make sense, but it usually doesn't.
A data model is a set of constraints; without constraints, anything would be possible. Cardinality is a (special kind of) constraint. In most cultures, a marriage is a relation between exactly two persons. (In some cultures these persons must have different gender.)
The problem with data modelling is that you have to specify the constraints you wish to impose on the data. Some constraints (unique, foreign key) are more important, and less dependent on the problem domain as others ("salary < 100000"). In most cases Cardinality will be somewhere in between crucial and bogus.
If you are creating the data layer of an application and you decided to use an ORM, maybe it's entity framework.
There's a point when you need to create your models and your model maps. At that point you would be able to pull out your ERD, review the cardinality you put on your diagram and create the correct relationships so your data layer shape matched your database shape.

Why are there "relations" on databases instead of just using SQL's join?

I always see in database articles or tutorials or... just everywhere where they use databases, they use a thing called relations. It comes to my mind instantaneously those little boxes with lists of field names and one field connected to another field in another box with a line.
I'm not an expert on databases (as you can probably tell) but the little bit I've used, I never needed relations. They always seemed to be redundant as I can always use JOIN to achieved what it seemed to me they are made for. Are they redundant or is there anything you can do with relations that you cannot do with JOIN? Or am I just talking nonsense?
Relations are not just about joins for SQL queries. Relations provide many benefits:
Data integrity
Query convenience
Third party tool integration benefits
"Self-describing" data model to future DBAs/developers working with the database
Etc
Data integrity:
Relations help to ensure that your "order records" can't exist without a "customer record" for example. Simply by defining a relationship between customer and order, the database will ensure that this cannot happen. This helps to make sure that your database doesn't become a big pile of junk data
Query convenience:
Relations can make it easier to do certain types of queries. Deleting a customer record can automatically have the customer's orders deleted at the same time, thanks to the relationship between customer and order
Third party tool integration benefits
Many third party tools (O/R tools come to mind) rely on relations in order to work properly
Really, the list could go on and on...you should use them, they're very beneficial. Even if you don't perceive the value today, if you're working on a database project that will continue to grow over a long period of time, it would be to your benefit to set relationships up from the beginning.
I think that they're not that critical for small projects/one-off data models...but for anything of substance, you're better off using them.
A RELATION is a subset of the cartesian product of a set of domains (http://mathworld.wolfram.com/Relation.html). In everyday terms a relation (or more specifically a relation variable) is the data structure that most people refer to as a table (although tables in SQL do not necessarily qualify as relations).
Relations are the basis of the relational database model.
Relationships are something different. A relationship is a semantic "association among things".
I think you are actually asking about referential integrity constraints (foreign keys). A foreign key is a data integrity rule that ensures the database is consistent by preventing inconsistent data from being added to it. Don't confuse foreign keys with relations because they are very different things.
I'm assuming when you are reading about relations it is probably referring to foreign keys. If that's true, relations and joins are not different solutions for the same problem. They are 2 tools that accomplish different things, and they are usually used together.
A join as it sound like you know is part of a select query that let you get rows from more then 1 table.
A relation is part of the database structure its self that defines a rule. For example if you had a city table and a country table, you should have a relation pointing each row in the city table to a row in the country table. This would ensure the integrity of the data and not allow a city row to point to a country row that doesn't exist.
Asking "Why use relations when you can use joins?" to me sounds like asking 'Why do variables have types when I could read them anyway?".
The theory behind databases is based on something called Relational Algebra. Relation is not a database specific term, it is derived from Relational Algebra.
JOIN is kind of Relation, there can be different kind of relations. Refer to this wiki page to know more about what a Relation exactly is.
The relationships established in a RELATIONAL database are the very core of the relational database model. In a database, we model entities. We use relationships between entities to maintain data integrity, and ensure the records are organized properly. Relationships also create indexes between related tables.
If you are not using the relationships, and/or modelling your table structure based upon the relationships between discrete entities, then you are not harnessing the true power of your relational database. Yes, you can make queries work, and yes, you can get the Db to do some usefule work. But can you ensure that, say, every Employee record is properly RELATED to the proper company? Can you ensure that there is only one record for that company, and that all the emplotyees of that company are related to that record?
Without designing your database structure around entities and the relationships between them, you might as well use a spreadsheet, or one big, flat table. RELATIONSHIPS and NORMALIZATION form the basis of the modern relational database.
An SQL table is an approximation of a relational model relation. Tables/relations (bases, views & query results) represent relations/relationships/associations. These are boxes & diamonds on ER (Entity-Relationship) & pseudo-ER diagrams. Most lines on such diagrams correspond to FK (foreign key) constraints. They are frequently but wrongly called "relations" or "relationships" but they are not. They are facts. An SQL FK says that a table's subrows appear elsewhere where they are a PK (primary key) or UNIQUE. Equivalently, it says that an entity participating in a relation/relationship/association also participates once in another one. Table meanings are necessary & sufficient to query. Constraints--including PKs, UNIQUEs & FKs--are not needed to query. They are consequences of the table relation/relationship/association choices & what situations/states can arise. They are for integrity to be enforced by the DBMS.
When Ed Codd developed the relational model of data for use with large scale databases, he based his design on the mathematics of relational calculus and algebra. The results of this kind of mathematics is predictable with mathematical precision, and Ed Codd was able to forecast with near mathematical precision how relational databases would behave before the first one was ever built.
In mathematics, a relation is a mathematical abstraction. It's a subset of the cartesian product of two or more domains, as another responder said. If that's as clear as mud to you, maybe you're not a mathematician.
No matter. A good computer scientist can understand SQL tables fairly easily, and recognize and exploit the power of an SQL JOIN. This understanding will do in place of a mathematical understanding of relations for many purposes. An SQL table materializes a mathematical relation, approximately. If you are careful with table design, you can turn "approximately" into "exactly".

Is the set of data always normalized in one form or the other in Databases

Suppose I have a set of data, given the data and the relation schemas can I assume that the set of data is normalized in one form or the other. In my opinion raw data given, has to be normalized into some form. However a discussion with a friend has led to ask me this question here.
To expound more on the question, I would say given a set of functional dependencies for a relation or table, is it guaranteed that the table would atleast be in 1NF if not others
The SQL model permits tables that are not true relations at all (tables with duplicate rows for example). So it certainly is not the case that SQL databases are always normalized. I expect most people who work with databases have come across examples of tables without keys.
In my experience, in real life, you'll run into databases that span the entire spectrum from perfectly normalised to a complete mishmash of wierd stuff that you'd never expect to see anywhere, let alone in a database.
You certainly can't expect to find data in even 1NF (which is supposedly the most "basic" normalisation form).
Also, that's not to say that normalised data should be the "goal" of database design. You need to weigh clean design against performance criteria and fully normalised database can often work against your performance criteria.
No, The raw data are not necessarily normalized in 1NF. It depends on the schemas. There can be schema design which is not even in 1NF.
Let me give you an example,
Suppose there is a table which stores marks of all semester student. Each semester is having studentid 5. Assume the schema for Marks as
Marks(studentid,marks)
This table is not in 1NF as it will have repeated values.
I'd point you to another conversation on SO.
What are 1NF, 2NF and 3NF in database design?

Resources