Supertype/subtype Notation for ERD - database

This is more of a notation and 'proper procedure' type of question than anything.
Please see below an image of a few relations in my Enhanced ERD logical model. A patient can be an OUTPATIENT or a RESIDENT, but there are no attributes which are specific to OUTPATIENTS or RESIDENTS. There are relationships which are specific to the subtypes though, as only OUTPATIENTS can be associated with visits and only RESIDENTs can be associated with beds.
I am in the process of converting this to a physical data model. Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.
But what is the proper way to model this?
How do I now model the relationships to VISITS and BEDS while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?
Do I just forget about representing this constraint in the physical data model and make sure its implemented in the code when the tables are created?
Or is there a notation for physical data models which represents this type of constraint?
Section of CareCenter schema in Extended ERD
I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.
Advice or reference to data you have found that I was not able to is greatly appreciated!
(If you are really trying to make sense of my section of EERD it may be helpful to know that PATIENT is a subtype of a PERSON supertype.)

1  Modelling & Notation
1.1  ERD
is pre-Relational, 1960’s.  It cannot handle Relational Keys, which means it is hopeless for Relational Data Modelling.  In the Relational paradigm, the Relational Key (which is composite) is central, therefore the identity of each entity cannot be analysed, or modelled, or defined, in ERD.
There is no definition in ERD for the Relational concepts of Independent/Dependent tables, or Identifying/Non-Identifying relations, as it is meaningless without a Relational Key, which leads to much confusion when extending ERD and attempting to add those.  Further, as you have found, it has no notation of Domain/Datatype; Subtype; etc.
ERD never was a Standard.  Since it is un-useable, each person who attempts to use it for an SQL implementation has to “extend” ERD, and that results in a million notations, all of which are different and incomplete.  And which have to be explained to the reader.  Whereas a Standard needs no explanation because it is complete and documented, once.
Technically, ERD is not a model (which implies a mathematical, logical basis).  The semantics are primitive and nowhere near complete.  In fact, it is hopeless for modelling, period, even for pre-Relational filing systems.
1.2  IDEF1X
is the Standard for Relational Data Modelling, available since the 1980's, a Standard since 1993.  As such it is complete, whereas an extended ERD will never be complete, no matter how much you extend it.
The academics and authors of "textbooks" are clueless: as evidenced, they are 50 years behind the industry (definition) and 40 years behind (implementation on SQL platforms).  They are stuck in 1960's Record Filing Systems, which is physical, characterised by a RecordID, and they market it as "relational".  
Whereas Codd's Relational Model is completely logical, with a mathematical foundation, and provides far more Integrity; Power; and Speed.
To use ERD at all, you have to extend it, using some private notation, as you have done.  Instead of moving incrementally and painfully in the direction of IDEF1X, I suggest you just switch to it, and obtain the full benefit.  You may find this IDEF1X Introduction useful.
1.3  Logical vs Physical Data Model
There is a lot of nonsense written about the distinction.
The Logical model simply progresses, in iterations, to the point where it is stable, and then it is the Physical, which can be implemented on a specific SQL platform.  That is, there is no “convert” process.
In good Data Modelling tools, such as ERwin, it is one file, not two or three, and the Logical vs Physical is simply different views of that one file.  Eg. Domain in the Logical is DataType in the Physical. The Physical is of course specific to the target platform, eg. BOOLEAN in one is BIT in another.  If you are not using a Data Modelling tool, or using a poor one, sure, you will have separate files and you have to deal with the attendant synchronisation problems.
But what is the proper way to model this? How do I now model the relationships to visits and beds while still maintaining the constraint that the discriminator must be of a certain value to qualify for those relationships?
In this regard, the question is not about Logical vs Physical DM, all aspects re the question are implemented in both.
Yes, it is about notation. There is no notation problem, or difference (Logical vs Physical) in IDEF1X, because it is complete.
Do I just forget about representing this constraint in the physical data model
No, they are drawn in both, they are implemented in the DDL.
and make sure its implemented in the code when the tables are created?
If you use a Data Modelling tool, it squirts out SQL that is specific to the target platform. Otherwise, sure, you have to write your own DDL and make sure it is correct. In any case, the SQL is the same (not counting the difference in SQL flavours).
Caveat.  The pretend SQLs (all freeware “sqls” and Oracle) are not SQL compliant, their use of the term is not correct.  They cannot implement ordinary SQL features such as Constraints for Subtypes or ACID Transactions; etc.
Or is there a notation for physical data models which represents this type of constraint?
No, there is no difference in the notation in IDFE1X. Your question appears to be due to your extensions to ERD. First, the ERD is not useable for Relational data modelling, and cannot cope with Relational Keys or Subtypes.  Second, your extensions, good as they may be, do not have the ordinary Relational notation that IDEF1X has. Again, just switch to IDEF1X.
2  Codd’s Relational Model
As distinct from the variety of primitive nonsense written by the academics and in textbooks, misleadingly marketed as “relational”.
2.1  Subtype
I have done much searching and cannot seem to find anything about this. All of the material I have found talks about creating subtypes for the purpose of isolating attributes specific to a subtype and not relationships specific to a subtype.
There is no problem at all with a Subtype that has no attributes, same as there is no problem at all with a row that has no attributes.  Keep in mind that each entity is a Fact (one fact in one place), and the Fact is established by the Relational Key, to which the attributes are quite secondary (Codd’s 3NF properly understood).  Thus Resident and OutPatient are discrete Facts, whether each Subtype has attributes or not; whether the Fact exists for supporting a Foreign Key or not, is a separate issue.
Advice or reference to data you have found that I was not able to is greatly appreciated
You may find this Subtype document useful.  For examples, go to my profile, and look up any answers that interest you.
If you require even further detail, there is a long discourse regarding Subtypes and notation, that I had with the single academic who is trying to cross the great chasm between academia and reality in this field, who recently "found" IDEF1X from my data models.  I use a corrected form of IDEF1X (it was written by an academic), using the pre-existing IEEE notation when it is more precise.  The discourse goes into the whys and wherefores of the original IDEF1X vs the corrected form.  It is long at 70 posts, and there is a document that summarises it. Just ask.
Obviously it makes sense to not have OUTPATIENT or RESIDENT tables and only a PATIENT table which contains a discriminator for the type of patient.
No.  Each Subtype is a separate table, in the Logical models (first) and Physical (last), and the DDL. The physical is merely the implementation level of the Logical, you should not have anything in the Physical that is not in the Logical (you do not want to implement a thing that is not logical, not semantic; not Relational (which is absolutely logical, and unlimited).
Consider that the database may be expanded in the future, and you may have attributes in the Subtypes. 
- If the cluster is Exclusive, the Basetype table must have a Discriminator. 
- If it is Non-Exclusive, there is no Discriminator.
Supertype means something quite different, the academics use terms loosely and incorrectly. Eg. the notion of Superkey is hysterical, and anti-Relational.
2.2  Data Model
Here is the logical model in IDEF1X notation, showing attributes, not domains.  
I have corrected a few errors: given the level of modelling that you have demonstrated, I don't think they need a full explanation.
Person Subtype is Non-Exclusive (no Discriminator)
Patient Subtype is Exclusive (needs a Discriminator)
That is to be used in your code to determine the Subtype, otherwise JOIN to the Subtype
Since Resident::Bed is 1::1, the attributes (Bed FK) can be located in Resident.  
This treatment ensures that the Bed that a Patient may be assigned to, exists.
Consider:
When an OutPatient visits the CareCenter, is not the purpose to obtain a treatment of some kind, which must be recorded ?
Is not the treatment obtained under a Physician’s control, and shouldn’t the treatment details be recorded ?
Therefore an OutPatient obtains a Treatment, same as a Resident, and it is common, in the Basetype.
Visit can be eliminated
(again, whether the treatment is received by a Resident or OutPatient regards the Subtype).
The data model in a PDF.
2.3  Predicate
The Predicates can be read directly from the graphic model, the evaluation of such provides an excellent feedback loop to the modelling process.  Please read them and verify.
Eg. the Predicate Each Bed accommodates 0-to-n Residents would cause a brawl that can be avoided.
Again, the academics and authors do not understand the Relational Model, and thus they are clueless about Predicates. For a good introduction, refer to Relational Table Naming Convention, the Relationship, Verb Phrase section at the top, and the Predicate section at the end.
2.4  Null
Nulls in a Relational database are a clear indication of a Normalisation error. I have removed them.
3  Outstanding
The academics and authors understand only 1960's physical Record Filing Systems (placed in an SQL container for convenience), thus they understand only Referential Integrity.  They do not understand Codd's Relational Model, thus they cannot understand, and they cannot teach, Relational Integrity, which is logical, and provides far more data integrity than 50-year-obsolete filing systems.
Your model allows any Physician to treat any Patient, which is typical for a RFS, if you follow the literature, but sub-normal for Relational.
I doubt that that is what you want in a database.  I think you want only the treating Physician, the ProviderNo to treat the Patient.  
As the model progresses, you may wish to ensure that a Bed is assigned to one Resident only. I didn’t model it because I need to be told: is admission and bed assignment two administrative steps or one ?
Do you not require lookup tables for Speciality and TreatmentName ?
Data Modelling is an iterative exercise: it is only when a model is erected, and contemplated, that the issues are exposed, which leads to the next iteration.

Related

Either or relational algebra enterprise constraint

I need to define a constraint where tuples in a booking table can only have a value in musician (foreign key attribute from musician table) or actor (foreign key attribute from actor table), and must have one of these, but not both. At first I came up with this solution -
1. select any tuple from booking, call it x;
2. project x's musician column, call it y;
3. project x's actor column, call it z;
4. count(y) + count(z) = 1;
This works but also unintentionally imposes the constraint that the 'empty' booking's musician and actor columns cannot contain an empty string. How can I fix this issue?
P.S. I'm aware that count() isn't always part of relational algebra but I am permitted to use it for this purpose.
Problem
The obstacles you are facing are these:
no clear separation between data analysis and problem or process analysis
resorting to relational calculus or any other theoretical concept to sole a practical (eg. data modelling) problem.
you are making assumptions on dependencies (or experiencing problems with) where the referred thing is not yet clearly defined
Solution
The solutions are:
first, model the data, and only as data, without regard to what you need to do in any given Process
the Data Model should reflect reality, the real world.
understand and appreciate the theory, but implement using practical methods. That is, straight Relational Data Modelling using the Standard for Relational Data Modelling, IDEF1X.
btw, "There are many RAs" is incorrect: there is just one Relational Calculus, by Dr E F Codd. Sure, there are many pretenders after him, but Codd's RA is the only one that is complete; resolved; universally known; and accepted. philipxy is one of those, they hate Codd.
finish the Data Model properly. Define the referred thing reasonably, before attempting to define the dependent thing.
Before you can model a Booking for exclusively {Actor|Musician}, you need to model {Actor|Musician} ... which is a Person
a Person can be {Actor|Musician|Both}, ie. non-exclusive
but the Booking for {Actor|Musician} needs to be exclusive.
Data Model
Easily modelled in the Relational paradigm. As a consequence, the SELECT is simple and straight-forward.
The Data Model in IDEF1X/ER Level (not ERD) is:
Notice how it is not a RA issue, but a Data Modelling issue. In two hierarchic locations.
Note
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
For full definition and usage considerations re Subtypes, refer to Subtype Definition.

Representing an either-or relationship in Crows foot ERD

I am working on a practice questions for ERD, and I was wondering what the correct approach is for modelling either or relationships.
For example, in a Taekwondo school, you will have customer accounts, which will represent and pay for one or many students. The account is owned by either a parent, or a the student himself. Therefore the account owner is either a parent or a student. What is the best way to represent a relationship like this?
Here is what I came up with, but I am unsure if this conforms to best practice:
1 Clarification
Representing an either-or relationship in Crows foot ERD
The diagram you have is a good start. Note:
that is not ERD. That is way more detail than an ERD can handle
ERD does not have a Crows Foot, that is IEEE notation
Ultimately, you need a data model that has the detail required for an implementation (way more than ERD). That is why I said your diagram is a good start, it is moving in that direction. However, we have a Standard for Relational Data Modelling: IDEF1X, the Standard for modelling Relational databases since 1993, available since 1984 before it was elevated to a standard.
Evidently both Dr E F Codd's Relational Model, and the diagrammatic method for modelling Relational databases is suppressed.
The relationship symbol, especially the cardinality, in IEEE notation is better (more easily understood) than IDEF1X, therefore most people use that. All data modelling tools, such as ERwin, implement IDEF1X, and allow either IDEF1X or IEEE notation for relationships.
2 Request
The diagram as intended is illegal. Why ? Because you have one relationship going "out" of Person, to two tables. Not possible. You are asking how to represent such a relationship in a data model (not possible in ERD). The answer is, that is an OR Gate is logical terms, a Subtype in Relational terms.
Please inspect these answers for overview and detail. Follow the links for implementation details and code:
How can I relate a primary key field to multiple tables?
Structuring database relationships for tracking different variations of app settings
How do I get around this relational database design smell?
Subtypes can be:
Exclusive (the Basetype must be one of the Subtypes), or
Non-Exclusive (the Basetype must be any [more than one] of the Subtypes).
From Role it appears to be Exclusive. What you call Role is a Discriminator in IDEF1X.
That is best practice for Relational databases.
Relational Data Model
This is best practice for for data models (this level of detail shows attribute name only).
Of course, all my data models are rendered in IDEF1X.
My IDEF1X Introduction is essential reading for beginners.
ParentId, StudentId, OwnerId are all RoleNames (Relational term)of PersonId. This makes the context of the FK explicit.
3 Correction
but I am unsure if this conforms to best practice
Since you are concerned, there is one other issue. There is a mistake in your model, it is one of the common errors that happen when one stamps id on every file. Such a practice cripples the modelling exercise, and makes it prone to various errors. (I understand that you are taught that crippling method.)
Since a Person can have 0-or-1 Account, and the Person PK (which is unique to a Person), is a FK in Account, it can be the PK in Account.
AccountId is not necessary: it is 100% redundant, one additional field and one additional index, that can be eliminated.

Conceptual model vs Logical model vs Physical model

I'm doing a work about database, and now I need to show three different images, one image with the conceptual model, other with logical model and other with physical model of a database.
But Im here with some difficults to understand which image represents each model.
I'm looking for reliable information about this, but I find different answers and I'm a bit confused.
So I came here to see if you can help me.
I have below my three images, do you think I have the correct title for each image?
Conceptual model:
In conceptual model, I think that I neeed to put my tables with atributes but without relationships.
Logical Model:
In logical model, I think I need to put my tables with atributes, but now with my relationships.
Physical Model:
In physical model, I think I need to put my tables with atributes, but now with my relationships and also with foreign keys
A Conceptual Model (CM) is an informal representation of the business represented in a manner that is understood by users. It will consist of classes of entities with attributes and the business rules regarding these. It is often presented as Entity-Relationship Diagrams.
A Logical Model (LM) formalizes the CM into data structures and integrity constraints. it should include all the data structures and integrity constraints for the data (this is all constraints, not just that subset of constraints that are easily defined in most available database management systems). It is database management system agnostic.
The LM may be presented as a Relational Data Model (RDM). In which case all the data structures and integrity constraints will be formally represented only using mathematical relations.
A Physical Model (PM) is a representation of the LM on specific hardware and database management system. It may consist of information such as storage sizing and placement; access methods such as indexing; and distribution such as clustering or partitioning.
Using these definitions I would say that all you diagrams are versions of Conceptual Models; as they do not include all the integrity constraints for the data being managed and do not include any information regarding an implementation on specific hardware or database management system.
The conceptual/logical/physical layers have changed somewhat over the years, and also vary according to different schools of thought. The way I learned it, back in the 1980s was this:
The conceptual model summarizes the semantics of the data with reference to the subject matter. It is not bound to a relational implementation. The implementation could be in some sort of prerelational database, or even in classical files of records. You have entities, relationships, attributes, and domains. You also have business rules. That's about it. Like your summary, it's primarily for communication with users and other stakeholders. The idea is to pin down the requirements during the analysis phase.
The logical model is a preliminary design. It's bound to the relational model, but not to a specific DBMS. You have relations, tuples, attributes, and constraints. Relationships are implemented as foreign keys, sometimes requiring junction relations. I tended to use the terminology of tables, rows, and columns, instead of relations, tuples, and attributes, but that's mostly nomenclature. Normalization is relevant here.
The physical model is a detailed design. It's DBMS specific, and takes into account data volume, expected traffic, and performance. Denormalization is relevant here. This leads directly to a creation script.
This is by no means the majority view, let alone a general consensus. You need to understand your audience to see if this framewok works.
Is it a homework or what? The question seems so artificial...
The 3rd one is Physical because the data types are closer to actual DBMS data types.
Between the 1st, and 2nd ones... I'm stuck. The only difference is the crow-feet relationship. If there's a progression between the three images, I'd guess this would make the 2nd one the Conceptual.
But it is difficult because, with PowerDesigner, you could still represent the relationship with crow-feet in the Logical model. But anyway, there should be evidence of the migration of the "foreign key" attribute id_cat in the News entity, which is missing here.
Nope. I was reading my example diagrams too fast, there's no migration in Logical model.
So, just by elimination, I'd make the 1st one the Logical.

Database Normalization Vocabulary

There is lot or material on database normalization available on Steve's Class and the Web. However, I still seem to lack on very definite reasons on explaining normalization.
For example, for a simple design such as a table Item with a Type field, it makes sense to have the Type as a separate table. The reason I forwarded for that was if in future any need arose to add properties to the Type, it would be much easier with a separate table already existing.
Are there more reasons which can be shown to be obvious?
Check these out too:
An Introduction to Database Normalization
A Simple Guide to Five Normal Forms
in Relational Database Theory
This article says it better than I can:
There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.
following links can be useful:
http://support.microsoft.com/kb/283878
http://neerajtripathi.wordpress.com/2010/01/12/normalization-of-data-base/
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization. In his own words:
To free the collection of relations from undesirable insertion, update and deletion dependencies;
To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
To make the relational model more informative to users;
To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
— E.F. Codd, "Further Normalization of the Data Base Relational Model"
Taken word-for-word from Wikipedia:Database normalization

What is a "database entity" and what types of DBMS items are considered entities? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is it things like tables? Or would it also include things like constraints, stored procedures, packages, etc.?
I've looked around the internet, but finding elementary answers to elementary questions is sometimes a little difficult.
That's quite a general question!
Basically, all types that the database system itself offers, like NUMERIC, VARCHAR etc., or that the programming language of choice offers (int, string etc.) would be considered "atomic" data(base) types.
Anything that you - based on your program's or business' requirements - build from that, business objects and so forth, are entities.
Tables, constraints and so forth are database-internal objects needed to store and retrieve data, but those are general not considered "entities". The data stored in your tables, when retrieved and converted into an object, that then is an entity.
Marc
In the entity relationship world an entity is something that may exist independently and so there is often a one-to-one relationship between entities and database tables. However, this mapping is an implementation decision: For example, an ER diagram may contain three entities: Triangle, Square and Circle and these could potentially be modelled as a single table: Shape.
Also note that some database tables may represent relationships between entities.
This seems helpful: http://en.wikipedia.org/wiki/Entity-relationship_model
In a database an entity is a table. The table represents whatever real world concept you are trying to model (person, transaction, event).
Contraints can represents relationships between entities. These would be foreign keys. They also enforce rules like first_name can not be blank (null). A transaction must have 1 or more items. An event must have a date time.
Stored Procedures / Packages / Triggers could handle more complex relationships and/or they can handle business rules, just depends on what it's doing.
it kind of depends how you think about it and how you model your problem domain. most of the time when you hear about entities, they are database tables (one or many) mapped onto object classes. So it's not really an entity until it's been queried for and turned into a class instance.
but again, it depends on your modeling methodology, and there are multiple :-)
This thread is demomnstrating one reason why it is difficult to find "elementary answers to elementary questions". Certain words have been used by different programming paradigms to mean different things (try asking a bunch of OO programmers what is the difference between a Class and an Object sometime).
Here's my take on it.
I first came across Entity as a modelling term in SSADM (ask your dad). In that context an Entity is used to model a logical clump of datas during the requirements gathering / analysis phase. The relationships between entities were modelled using the Entity Relationship diagrams, and the profile of an Enity was modelled using Entity Life Histories. ELH diagrams were very useful in COBOL systems but utterly horrible in relational databases. ERDs on the other hand continue to be useful to this day.
During the design and implementation phases the Entities get resolved into database tables, objects or records in a COBOL input file. In the course of that process a logical entity may get split across multiple tables, or several entities may get squidged into a single table, or there may be a one-to-one mapping. Sometimes an entity is resolved away entirely or lingers on as a view or a stored procedure.
My answer is obviously a little late, but here it is as defined in a database certification text book:
Entity: A uniquely identifiable element about which data is stored in a database.
and to clear up entity and table confusion,
Entity is not a table. Tables can be called "tables" or "relations" the words are synonymous.
We'd need to know some context. One thing people sometimes do when analysing data in prepartion for designing a database is to create an Entity Realtionship Diagram, where you are considering what data items you are managing and their relationships.
I wonder if that's the context you mean?
If so perhaps a read of this article would get you started?
Entities are "things of significance" to the users/business/enterprise/problem domain.
Update:
See this article in my blog in which I try to cover the subject in more detail:
What is entity-relationship model?
An entity is a term from the entity-relationship model.
A relational model (your database schema) is one of the ways to implement the ER model.
Relational tables represent relations between simple types like integers and strings, which, in their turn, can represent everything: entities, attributes, relationships.
You cannot tell what is it only from the relational structure, you need to see the ER model.
For table persons,
id name surname
1 John Smith
id, name and surname are entities in the real world and may or may not represent entities in the underlying ER model.
The fact of a record exists in the table means that these entities are in the following relation: "person 1 has name John and has surname Smith".
In the example above, the entity is defined by id (from the model's point of view).
If a person changes his name from John to Jack, the person remains the same (again, from the model's point of view), but gets related to another name.
In example above name and surname can be treated as attribute (as opposed to entity), but again, you need to see the ER model which this schema implements to tell what is it.
In some ER-to-relational model mappings, an entity should be defined in a table referenceable with a FOREIGN KEY to be considered an entity (which should constrain its domain).
However, this constraint can exist but not be represented in a database (due to technological limitations or something else).
Like, we cannot keep a list of all possible names, but the name of ##$^# is most probably a non-name, hence, it does not belong to the domain of names.
Therefore, an attribute is an entity which can participate in a relationship but cannot be contained in a domain-defining table.
For instance, the table prices:
good_id price
defines relationships between the set of goods (which is defined by the table goods) and the set of real numbers (which cannot be contained in a table since it's not even countable).
Still each price (like $2.00) is a real-world entity just as well.

Resources