Database design rules to follow for a programmer

Database design rules to follow for a programmer - database

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.

In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)

1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Related

Designing a database with similar, but different Models

I have a system whereby you can create documents. You select the document type to create and a form is displayed. Data is then added to the form, and the document can be generated. In Laravel things are done via Models. I am creating a new Model for each document but I don't think this is the best way. An example of my database :
So at the heart of it are projects. I create a new project; I can now create documents for this project. When I select project brief from a select box, a form is displayed whereby I can input :
Project roles
Project Data
Deliverables
Budget
It's three text fields and a standard input field. If I select reporting doc from the select menu, I have to input the data for this document (which is a couple of normal inputs, a couple of text fields, and a date). Although they are both documents, they expect different data (which is why I have created a Model for each document).
The problems: As seen in the diagram, I want to allow supporting documents to be uploaded alongside a document which is generated. I have a doc_upload table for this. So a document can have one or more doc_uploads.
Going back to the MVC structure, in my DocUpload model I can't say that DocUpload belongs to both ProjectBriefDoc and ProjectReportingDoc because it can only belong to one Model. So not only am I going to create a new model for every single document, I will have to create a new Upload model for each document as well. As more documents are added, I can see this becoming a nightmare to manage.
I am after a more generic Model which can handle different types of documents. My question relates to the different types of data I need to capture for each document, and how I can fit this into my design.
I have a design that can work, but I think it is a bad idea. I am looking for advice to improve this design, taking into account that each document requires different input, and each document will need to allow for file uploads.

You don't need to have a table/Model for each document type you'll create.
A more flexible approach would be to have a project_documents table, where you'll have a project_id and some data related to it, and then a doc_uploads related to the project_documents table.
This way a project can have as many documents your business will ever need and each document can have as many files as it needs.
You could try something like that:
If you still want to keep both tables, your doc_upload table in your example can have two foreign keys and two belongsTo() Laravel Model declarations without conflicts (it's not a marriage, it's an open relationship).
Or you could use Polymorphic Relations to do the same thing, but it's an anti-pattern of Database Design (because it'll not ensure data integrity on the database level).
For a good reference about Database Design, google for "Bill Karwin" and "SQL Antipatterns".
This guy has a very good Slideshare presentation and a book written about this topic - he used to be an active SO user as well.

ok.
I have a suggestion..you don't have to have such a tight coupling on the doc_upload references. You can treat this actually as a stand alone table in your model that is not pegged to a single entity.. You can still use the ORM to CRUD your way through and manage this table..
What I would do is keep the doc_upload table and use it for all up_load references for all documents no matter what table model the document resides in and have the following fields in the doc_upload table
documenttype (which can be the object name the target document object)
documentid_fk (this is now the generic key to a single row in the appropriate document type table(s)
So given a document in a given table.. (you can derive the documenttype based on the model object) and you know the id of the document itself because you just pulled it from the db context.. should be able to pull all related documents in the doc_upload table that match those two values.
You may be able to use reflection in your model to know what Entity (doc type ) you are in.. and the key is just the key.. so you should be able.
You will still have to create a new model Entity for each flavor of project document you wish to have.. but that may not be too difficult if the rate of change is small..
You should be able to write a minimum amount of code to e pull all related uploaded documents into your app..

You may use inheritance by zero-or-one relation in data model design.
IMO having an abstract entity(table) called project-document containing shared properties of all documents, will serve you.
project-brief and project-report and other types of documents will be children of project-document table, having a zero-or-one relation. primary key of project-document will be foreign key and primary key of the children.
Now having one-to-many relation between project-document and doc-upload will solve the problem.
I also suggest adding a unique constraint {project_id, doc_type} inside project-document for cardinal check (if necessary)

As other answers are sort of alluding to, you probably don't want to have a different Model for different documents, but rather a single Model for "document" with different views on it for your different processes. Laravel seems to have a good "templating" system for implementing views:
http://laravel.com/docs/5.1/blade
http://daylerees.com/codebright-blade/

Is there a pattern to avoid ever-multiplying link tables in database design?

Currently scoping out a new system. Like many systems, it will be required to store documents and link them to other kinds of item. In this instance a Document object can belong to a Job or it can belong to an Item (which in turn belongs to a Job).
We could do this by having a JobId and an ItemId against a Document and leaving one or the other blank if necessary, but that's going to mean annoying conditional logic in the handling code. So, two link tables seems a better idea.
However, it is likely that we will need to link Documents to other items in the system at some point in the future. There are Company and User objects, for example, and we might want to record Documents against those. There may be more.
That would entail a proliferation of link tables which, while effective, is messy and hard to follow.
This solution is in SQL Server and will be handled in code via Entity Framework.
Are there any design principles that can allow us to hook up Document objects with a variety of other system objects as required in a neater and more flexible way?

You could store two values: the id, and the type of object to which the document is attached. It doesn't allow the use of foreign keys, but is compatible with many application development frameworks.
If you have the partitioning option then you could dedicate different partitions to different object types.
You could also have multiple tables, one for job documents, one for item documents, and get an overview of all of them with a view that UNION ALL's them together. If you need uniqueness in that result set then you could use UUIDs for the primary key, or add an extra column to the view to express from which table the row was read.

Store multiple values in one database field in Access (hear me out)

So I've done extensive searching on this and I can't seem to find a good solution that actually applies to my situation.
I have a list of projects in a table, then a list of people. I want to assign multiple people to one project. Seems pretty common. Obviously, I can't make multiple columns on my projects table for each person, as the people will change fairly frequently.
I need to display this information very quickly in a continuous list of projects (the ultimate way would be a multiple-select combobox as a listbox is too tall, but they don't exist outside of the dreaded lookup fields)
I can think of two ways:
- Store multiple employee IDs delimited by commas in one field in my projects table (I know this goes against good database design). Would require some code to store and retrieve the data.
- Have a separate table for employees assigned to projects (ID, ProjectID, EmployeeID). One to many relationship between projects table and this new table. One to many relationship between employees table and this new table. If a project has 3 employees assigned, it would store 3 records in this table. It seems a bit odd joining both tables in this way, and would also require code to get it to store and retrieve into a control like the one mentioned above).
Does anyone know if there is a better way (including displaying in an easy control) or how you usually tackle this problem?

The usual way to tackle this problem would be with a Junction Table. This is what you describe where you have a separate table maybe called EmployeeProject which has an EmployeeProjectID(PK), EmployeeID(FK) and ProjectID(FK).
In this way you model a Many-to-Many relationship where each project can have many employees involved and each employee can be involved in many projects. It's not actually all that difficult to do the SQL etc. required to pull the information back together again for display.
I would definitely stay away from storing comma-delimited values as this becomes significantly more complicated when you want to display or manipulate the data.
There's a good guide here: http://en.tekstenuitleg.net/articles/software/create-a-many-to-many-relationship-in-access but if you google "many to many junction table" or similar, there are thousands of pages/articles about implementation.

How to model a database structure with repeating fields in every table

I'm in the process of structuring a databasemodel for my new project. For all the entities in my model (which is a cms, and the entities as such f.ex: page, content, menu, template and a bunch of others) they all have in common the same attributes on dates and names.
More specifically each entity contains the following for the dates: IsCreated, IsValidFrom, IsPublished, IsDeleted, IsEdited and IsExpired, and for names: CreatedByNameId, ValidFromByNameId, PublishedByNameId and so on...
I'm going to use EF5 for mapping to objects.
The question is as simple: What is the best way to structure this: Having all the fields in every table (which I am not obliged to...) or to have two separate tables which the other can relate to...?
Thanks in advance /Finn.

First of all - give this a read - http://www.agiledata.org/essays/mappingObjects.html
You really need to think about your queries/access paths. There are many tradeoffs between different implementations.
In reply to your example though,
Given the following setup:
COMMON
ValidFromByNameId
SPECIFIC1
FieldA
SPECIFIC2
FieldB
Querying by the COMMON attributes is easy but you'll have to work some magic when pulling up the subclasses (unless EF5 does it for you)
If the primary questions you're asking are about specific1 and specific2 then perhaps this isn't the right model. having the COMMON table doesn't really buy you much necessary as it will introduce a join to load any Specific1 object. In this case, i'd probably just have duplicate columns.
This answer is intentionally partial as a full answer is better handled by the numerous articles and blogs already out there. Search for "mapping object hierarchies to databases"

Entity Attribute Value model (EAV) and how to achieve it with cfml?

I'm trying to figure out how to implement this relationship in coldfusion. Also if anyone knows the name for this kind of relationship I'd be curious to know it.
I'm trying to create the brown table.
Recreating the table from the values is not the problem, the problem that I've been stuck with for a couple of days now is how to create an editing environment.
I'm thinking that I should have a table with all the Tenants and TenantValues (TenantValues that match TenantID I'm editing) and have the empty values as well (the green table)
any other suggestions?

The name of this relationship is called an Entity Attribute Value model (EAV). In your case Tenant, TenantVariable, TenantValues are the entity, attribute and value tables, respectively. EAV is attempt to allow for the runtime definition or entities and is most found in my experience backing content managements systems. It has been referred to an as anti pattern database model because you lose certain RDBMS advantages, while gaining disadvantages such as having to lock several tables on delete or save. Often a suitable persistence alternative is a NoSQL solution such as Couch.
As for edits, the paradigm I typically see is deleting all the value records for a given ID and inserting inside a loop, and then updating the entity table record. Do this inside of a transaction to ensure consistency. The upshot of this approach is that it's must easier to figure out than delta detection algorithm. Another option is using the MERGE statement if your database supports it.

You may want to consider an RDF Triple Store for this problem. It's an alternative to Relational DBs that's particularly good for sparse categorical data. The data is represented as triples - directed graph edges consisting of a subject, an object, and the predicate that describes the property connecting them:
(subject) (predicate) (object)
Some example triples from your data set would look something like:
<Apple> rdf:type <Red_Fruit>
<Apple> hasWeight "1"^^xsd:integer
RDF triple stores provide the SPARQL query language to retrieve data from your store much like you would use SQL.