Good day,
Real estate companies have several Buildings, each Building managed by one or more Managers, Managers have access to one or more Buildings. So, there is a many-to-many relationship between Managers and Buildings. It has to be a table such as Permissions to get rid of many-to-many relationship.
Please help me to figure it out, what is the best design for the database ?
I came up with a two candidate diagrams, which one is better? If neither of them are good, what should I change ?
http://i.stack.imgur.com/Z0l6h.png
http://i.stack.imgur.com/Dg5Sv.png
Sincerely
The second picture seems closest
I'd suggest moving the boxes around a little to show the hierarchy. Put Companies top and center, then on the next row, Managers on the left, Buildings on the right and Permissions between those two.
ER diagrams are used for two different purposes. One purpose is to illustrate the subject matter entities, and the relationships between them, as understood by subject matter experts. This is called a conceptual model of the data.
The other purpose is to illustrate a proposed database design, one where the relationships are not only expressed, but also implemented somehow. If the design is relational (which it usually is) many-to-many relationships are expressed by creating an intermediate table. This is called a physical model of the data (in some literature it's called a logical model). This is what you have done in your second diagram.
Your first diagram could be cleaned up a little by eliminating the box named "permissions", and putting a crows-foot at both ends of the line connecting Managers and Buildings.
Now to come back to your question: which one is "better"? It depends. sometimes, a conceptual diagram is better for discussing the subject matter with the ultimate stakeholders: non-technical managers who work with the data all the time, and might be called "subject matter experts".
A physical diagram is usually better when discussing the proposed design among data architects and programmers. It explains not only how the data works in concept, but also how the database is to be built. This kind of detail is glossed over by a conceptual model.
So you may end up with two diagrams, and use the appropriate one depending on your audience.
Related
I am not that much good at database diagraming. Whenever I am asked to create an ERR Diagram, I use MySQL WorkBench software.
However today I ended up in a conclusion when I see different types of ER Diagrams. My diagrams (designed via MySQL WorkBench) are like below.
And I saw other types of ER Diagrams like below.
Can someone please confirm which ER Diagram model should I use?
An Entity Relationship Diagram is an example of a presentation of a Conceptual Model. A Conceptual Model is used to help people understand the subject area(s) the model represents. Therefore, the correct presentation of a Conceptual Model - which may be or include an Entity Relationship Diagram - is one that all interested parties are satisfied adequately explains these subject areas.
These interested parties should include potential users of a system that incorporates the subject areas, managers of these areas and IT professionals who will be designing and building a system covering these areas.
The agreed Conceptual Model is then taken by the IT professionals and formalized into a Logical Model, which may be presented as a Relational Data Model.
Actually both of them are ER diagrams. However, the second one is its scientific representation. MySQL use a representation which is more understandable way of it.
Ok, I can find hundreds of references on the internet of the difference between top-down database design vs bottom up database design approaches, however, I can't seeem to find any real world examples, or any inofrmation on which design is really more suitable for what circumstances.
Can anyone help me out?
I'm basing this answer on this Data Modeling Wikipedia article.
About half way down the Wikipedia page, there's a section called "Modeling methodologies".
A top down approach is used to create a new database. You model the objects at a logical level, then you apply the objects to a physical database design. For example, a relational database would need the objects to be mapped to tables.
To use a real world example, a payroll system would have to have person objects, along with other objects that hold pay rules (overtime for over 40 hours a week, overtime for more than 10 hours a day, etc.). There would be a pay period object, which holds the dates of the pay period and the pay day. This description isn't a complete design. As you think about the application more, you come up with additional objects that need to exist, and additional entities that need to be part of existing objects.
A bottom up approach is used to migrate a database from one physical database to another. Migrating from Oracle to IBM's DB2 usually requires some changes, as the column data types are not completely compatible. You would create tables based on the existing tables. Sometimes, you try to make a near exact copy, to minimize the application coding changes. Other times, you alter the table structure, usually to normalize further or to group columns together in a more logical way. Yes, the application code would have to change to accommodate the new database schema. But sometimes, the pain is worth the gain.
I've seen lots of database migrations. They're hard to describe in a post. They are painful to work through.
To understand the differences between these approaches, let's consider some jobs that are bottom-up in nature. In statistical analysis, analysts are taught to take a sample from a small population and then infer the results to the overall population. Physicians are also trained in the bottom-up approach. Doctors examine specific symptoms and then infer the general disease that causes the symptoms.
An example of jobs that require the top-down approach include project management and engineering tasks where the overall requirements must be specified before the detail can be understood. For example, an automobile manufacturer must follow a top-down approach to meet the overall specifications for the car. If a car has the requirement that it cost less than 15,000 dollars, gets 25 miles per gallon, and seating five people. In order to meet these requirements the designers must start by creating a specification document and then drilling down to meet these requirements.
taken from http://www.dba-oracle.com/t_object_top_down_bottom_up.htm
We have to redesign a legacy POI database from MySQL to PostgreSQL. Currently all entities have 80-120+ attributes that represent individual properties.
We have been asked to consider flexibility as well as good design approach for the new database. However new design should allow:
n no. of attributes/properties for any entity i.e. no of attributes for any entity are not fixed and may change on regular basis.
allow content admins to add new properties to existing entities on the fly using through admin interfaces rather than making changes in db schema all the time.
There are quite a few discussions about performance issues of EAV but if we don't go with a hybrid-EAV we end up:
having lot of empty columns (we still go and add new columns even if 99% of the data does not have those properties)
spend more time maintaining database esp. when attributes keep changing.
no way of allowing content admins to add new properties to existing entities
Anyway here's what we are thinking about the new design (basic ERD included):
Have separate tables for each entity containing some basic info that is exclusive e.g. id,name,address,contact,created,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity to an attribute using a many-to-many relation.
Store addresses in different table and link to entities using foreign key.
We think this will allow us to be more flexible when adding,removing or updating on properties.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given stadium we might have a query with 20+ joins to fetch all related attributes in a single row.
What are your thoughts on this design, and what would be your advice to improve it.
Thank you for reading.
I'm maintaining a 10 year old system that has a central EAV model with 10M+ entities, 500M+ values and hundreds of attributes. Some design considerations from my experience:
If you have any business logic that applies to a specific attribute it's worth having that attribute as an explicit column. The EAV attributes should really be stuff that is generic, the application shouldn't distinguish attribute A from attribute B. If you find a literal reference to an EAV attribute in the code, odds are that it should be an explicit column.
Having significant amounts of empty columns isn't a big technical issue. It does need good coding and documentation practices to compartmentalize different concerns that end up in one table:
Have conventions and rules that let you know which part of your application reads and modifies which part of the data.
Use views to ease poking around the database with debugging tools.
Create and maintain test data generators so you can easily create schema conforming dummy data for the parts of the model that you are not currently interested in.
Use rigorous database versioning. The only way to make schema changes should be via a tool that keeps track of and applies change scripts. Postgresql has transactional DDL, that is one killer feature for automating schema changes.
Postgresql doesn't really like skinny tables. Each attribute value results in 32 bytes of data storage overhead in addition to the extra work of traversing all the rows to pull the data together. If you mostly read and write the attributes as a batch, consider serializing the data into the row in some way. attr_ids int[], attr_values text[] is one option, hstore is another, or something client side, like json or protobuf, if you don't need to touch anything specific on the database side.
Don't go out of your way to put everything into one single entity table. If they don't share any attributes in a sensible way, use multiple instantitions of the specific EAV pattern you use. But do try to use the same pattern and share any accessor code between the different instatiations. You can always parametrise the code on the entity name.
Always keep in mind that code is data and data is code. You need to find the correct balance between pushing decisions into the meta-model and expressing them as code. If you make the meta-model do too much, modifying it will need the same kind of ability to understand the system, versioning tools, QA procedures, staging as your code, but it will have none of the tools. In essence you will be doing programming in a very awkward non-standard language. On the other hand, if you leave too much in the code, every trivial change will need a new version of your software. People tend to err on the side of making the meta-model too complex. Building developer tools for meta-models is hard and tedious work and has limited benefit. On the other hand, making the release process cheaper by automating everything that happens from commit to deploy has many side benefits.
EAV can be useful for some scenarios. But it is a little like "the dark side". Powerful, flexible and very seducing it is. But it's something of an easy way out. An easy way out of doing proper analysis and design.
I think "entity" is a bit over the top too general. You seem to have some idea of what should be connected to that entity, like address and contact. What if you decide to have "Books" in the model. Would they also have adresses and contacts? I think you should try to find the right generalizations and keep the EAV parts of the model to a minium. Whenever you find yourself wanting to show a certain subset of the attributes, or test for existance of the value, or determining behaviour based on the value you should really have it modelled as a columns.
You will not get a better opportunity to design this system than now. The requirements are known since the previous version, and also what worked and what didn't. (Just don't fall victim to the Second System Effect)
One good implementation of EAV can be found in magento, a cms for ecommerce. There is a lot of bad talk about EAV those days, but I challenge anyone to come up with another solution than EAV for dealing with infinite product attributes.
Sure you can go about enumerating all the columns you would need for every product in the world, but that would take you a lot of time and you would inevitably forget product attributes in the way.
So the bottom line is : use EAV for infinite stuff but don't rely on EAV for all the database's tables. Hence an hybrid EAV and relational db, when done right, is a powerful tool that could not be acomplished by only using fixed columns.
Basically EAV is trying to implement a database within a database, and it leads to madness. The queries to pull data become overly complex, and your data has no stable, specific model to keep it in some kind of order.
I've written EAV systems for limited applications, but as a generic solution it's usually a bad idea.
What you do before starting the Database model diagram? I mean how you form the Requirements, Specifications etc. Use cases is one thing but anything else? Some best practice or a rule of thumb? Being a self learner I want to see how it goes in the hands of professionals?
Make sure you have a complete list of requirements from your client. Do your best to completely understand these requirements, it will really help in your design if you do. If YOU are defining the requirements it may be easier since you will already have an idea of what you need to do. Having a thorough grasp of your goal is the most important part.
If there is an obvious part of your database that will be the most important (an application in an online application system for instance) I will usually start from there and work out one piece at a time.
Personally I like to draw rough pictures (what ever makes sense to you, doesn't have to be an official ERD) of what I think the database will look like and revise it to finer levels of detail.
Don't rely only on written requirements. There is no such thing as a complete list of requirements. Talk to the stakeholders, ask questions and use the results of those interviews to determine what attributes need to be modelled, how they are used and to identify the business keys. Then some data analysis and investigation is usually needed to determine the right data types and other aspects.
It may be possible to get a good first cut of a data model up front but don't worry if you can't. Data modelling generally ought to be an iterative, agile process, done in sensible sized steps as a project evolves (although there are certainly cases like Data Warehouse design where the agile approach may be harder to apply).
Depending on your clientele, it can be a good idea to have two data models and two diagrams. One model and diagram is for data analysis. The other is for database design.
I have had good results by using an ER (Entity-Relationship) model and diagram for data analysis and an RDM (Relational Data Model) model and diagram to reflect database design.
The ER diagram is useful for communicating the requirements discovered so far back to the clients, and making sure they are complete and correct. ER diagrams are easy to understand even if the client has no background in database theory. As others have responded, this is an iterative process, not a once only waterfall.
The RDM model and diagram is useful for reflecting logical database design decisions such as the decision to normalize data or do something else. Its easy to derive an RDM model from an ER model, although you have to throw in some design decisions that are intentionally omitted from the ER diagram.
In turn, its easy to build a table create script from an RDM diagram. You will have to add some physical features like indexes, in order to obtain good performance without tearing your hair out.
I started my first MySQL project designing the ERD, logical and physical diagrams.
A friend of mine is making the same project as me. I started the plan of my databases by making an ERD and then normalizing.
However, he uses a relational database diagrams where he designs interfaces and other parts first before making the ERD. He for example writes "stack" only to the column phonenumbers, instead of making a "help-table".He says that it is best to make the interfaces first and then make the ERD.
Which one of us is doing the plan in your opinion better?
One could, and many people have, written a book on this.
However to generalise what I would generally do is
Analyse your data and reduce it down to 3rd normal form. This should be pretty formulaic to accomplish.
In light of likely business use of the data decide if and where data should be denormalized. Typically most databases will be overwhelmingly in 3rd normal with a few critical exceptions. This part is where experience and craft come in.
In light of the above create any additional indexes that may be necessary, or modify existing primary indexes (which should have been assigned in phase 1).
Create views for user access as necessary. The number you require may vary from none (as in simple embedded application) to many (as in no direct data access to tables allowed).
Create any procedures you need, and possibly triggers (generally best avoided but appropriate for audit purposes).
In practice of course the process is considerably more iterative, but the general design path from data to interface holds true. Also it's a good idea when designing a database to keep in mind that you will want to change it later and try if possible to make that a reasonably straightforward task.
I'm not sure what you mean by "interfaces and operations", but the way you're designing the schemas is right -- doing an ERD and properly normalizing. A lot of people, when they are just starting out, will take shortcuts on the design in order to fit the schema to their current level of querying skills.
For example, instead of creating a table of phone numbers and mapping these phone numbers to a "customers" table, they might just stick in columns called Phone1 Phone2 Phone3... instead. That can be the kiss of death later on when designing your queries.
So my advice... Create the normalized data model with ERDs. Then read up on VIEWs and user-defined functions in order to "flatten" out your schema where necessary for people who wish to query it. Sorry for the general answer, but it's kind of a general question...