Am I modelling my warehouse tables the right way?

Am I modelling my warehouse tables the right way? - database

I'm designing a website where users answer surveys. I need to design a data warehouse to aggregate their responses. So far in my model I have:
A dim table for Users.
A dim table for Questions.
A fact table for UserResponses. <= This is where I'm having the problem.
So the problem I have is that additional comments can be added to their responses. For example, somebody may come in and make 2 comments against a single response. How should I model this in the database?
I was thinking of creating another fact table for "Comments", and linking it to a record in UserResponses. Is this the right thing to do? This additional table would have something like the below columns:
CommentText
Foreign key relationship to fact.UserResponses.

Yes, your idea to create another table is correct. I would typically call it a "child" table rather than calling it another fact table.
The key thing that you didn't mention is that the table comments still needs an ID field. A table without an ID would be bad design (although it is indeed possible to create the table with no ID) since you would have no simple way to refer to individual comments.

In a dimension model, fact tables are never linked to each other, as the grain of the data will be compromised.
The back-end database of a client application is not usually a data warehouse schema, but more of an online transactional processing (OLTP) schema. This is because transactional systems work better with third normal form. Analytical systems work better with dimensional models because the data can be aggregated (i.e., "sliced and diced") more easily.
I would recommend switching back to an OLTP database. It can still be aggregated when needed, but maintains third normal form for easier transactional processing.
Here is a good comparison between a dimensional model (OLAP) and a transactional system (OLTP):
https://www.guru99.com/oltp-vs-olap.html

Related

Database modelling remove bridge table

I have designed a database with some outside help and I am thinking about a major change to the database model because of a problem creating reports in Power BI I have recently encountered here: SQL Power BI Report with Bridge Table
Disclaimer: if I could ask anyone within my firm, I would, but I can't.
We have a three-layer structure
A main table Firms, with information about the year and unique key/name for each firm
A bridge table Firm_Bridge which information about the type of the firm linked to Firm_Types
Many end Tables with information about sales and stuff
For these end tables, there are two types
Single type tables that only matter for one Type (End_Table_Type_A or End_Table_Type_B) (often the case)
Multi type tables that matter for more than one type (End_Table_Type_all) (rarely the case)
This is a made-up example similar to the real model
I am wondering if it was better to simplify this structure and directly connect the single type end tables to the firm table. I can still use the bridge table to document the type of firm and also connect the End_Table_Type_All to the firm's table. I imaging like this.
I hope this would avoid the problem that I have in my other question when summarizing stuff of end tables across years from Firms and reduce the complexity of our data model, as most tables are only single type end tables. I can exclude one join for most queries.
I am afraid I am missing something and that the current way is the proper way to model this.
What would happen in the changed model if I were to join the 'Firm_Type' to 'Firms' and then the single Type end table 'End_Table_Type_A'. Would it fetch the single end type table for each type?
Then, my idea would be stupid and I need to find another solution to my problem.

Just to give this closure. It seems to be indeed a stupid idea, because I would duplicate rows connected to the end tables, as soon as I merge the information of the type with the Firms table.

Complicated database design

We have a situation in a database design in our company. We are trying to figure out the best way to design the database to store transactional data. I need expert’s advice on the best relational design to achieve it. Problem: We have different kind of “Entities” in our system, for example; Customers, Services, Dealers etc. These Entities are doing transfer of funds between each other. We need to store the history of the transfers in database.
Solutions:
One table of transfers and another table to keep “Accounts” information. There are three tables “Customers”, “Services”, “Dealers”. There is another table “Accounts”. An account can be related to any of the “Entities” mentioned above; it means (and that’s the requirement) that logically there should be a one-to-one relationship to/from Entities and Accounts. However, we can only store the Account_ID in the Entities table, but we cannot store the foreign key of Entities in Accounts table. Here the problem happens in terms of database design. Because if there is a customer’s account, it is not restricted by the database design to not be stored in Services table etc. Now we can keep all transfers in one table only since Accounts are unified among all the entities.
Keep the balance information in the table primary Entities table and separate tables for all transfers. Here for all kind of transfers between the entities, we are keeping separate tables. For example, a transfer between a Customer and Service provider will be stored in a table called “Spending”. Another table will have transfer data for transfer between Service and Dealers called “Commission” etc. In this case, we are not storing all the transfers of the funds in a single table, but the foreign keys are properly defined since the tables “Spending” and “Commission” are only between two specific entities.
According to the best practices, which one of the above given solutions is correct, and why?

If you are simply looking for schemas that claim to deal with cases like yours, there is a website with hundreds of published schemas. Some of these pertain to storing transaction data concerning customers and suppliers. You can take one of these and adapt it.
http://www.databaseanswers.org/data_models/
If your question is about how to relate accounts to business contacts, read on.
Customers, Services, and Dealers are all sub classes of some super class that I'll call Contacts. There are two well known design patterns for modeling sub classes in database tables. And there is a technique called Shared primary Key that can be used with one of them to good advantage.
Take a look at the info and the questions grouped under these three tags:
single-table-inheritance class-table-inheritance shared-primary-key
If you use class table inheritance and shared primary key, you will end up with four tables pertaining to contacts: Contacts, Customers, Dealers, and Services. Every entry in Contacts will have a corresponding entry in one of the three subclass tables.
An FK in the accounts table, let's call it Accounts.ContactID will not only reference a row in Contacts, but also a row in whichever of Customers, Dealers, Services pertains to the case at hand.
This may work outwell for you. Alternatively, single table table inheritance works out well in some of the simpler cases. It depends on details about your data and your intended use of it.

You can make table Accounts with three fields with FK to Customers,Dealers and Services and it's will close problem. But also you can make three table for each type of entity with accounting data. You have the deal with multi-system case in system design. Each system solve the task. But for deсision you need make pros and con analyses about algorithm complexity, performance and other system requirements. For example one table will be more simple to code, but three table give more performance of sql database.

Performance in database design

I have to implement a testing platform. My database needs the following tables: Students, Teachers, Admins, Personnel and others. I would like to know if it's more efficient to have the FirstName and LastName in each of these tables, or to have another table, Persons, and each of the other table to be linked to this one with PersonID.
Personally, I like it this way, although trickier to implement, because I think it's cleaner, especially if you look at it from the object-oriented point of view. Would this add an unnecessary overhead to the database?
Don't know if it helps to mention I would like to use SQL Server and ADO.NET Entity Framework.

As you've explicitly mentioned OO and that you're using EntityFramework, perhaps its worth approaching the problem instead from how the framework is intended to work - rather than just building a database structure and then trying to model it?
Entity Framework Code First Inheritance : Table Per Hierarchy and Table Per Type is a nice introduction to the various strategies that you could pick from.
As for the note on adding unnecessary overhead to the database - I wouldn't worry about it just yet. EF is generally about getting a product built more rapidly and as it has to cope with a more general case, doesn't always produce the most efficient SQL. If the performance is a problem after your application is built, working and correct you can revisit and fix up the most inefficient stuff then.

If there is a person overlap between the mentioned tables, then yes, you should separate them out into a Persons table.
If you are only tracking what role each Person has (i.e. Student vs. Teacher etc) then you might consider just having the following three tables: Persons, Roles, and a bridge table PersonRoles.
On the other hand, if each role has it's own unique fields, then you should carry on as you are and leave each of these tables separate with a foreign key of PersonID.

If the attributes (i.e. First Name, Last Name, Gender etc) of these entities (i.e. Students, Teachers, Admins and Personnel) are exactly the same then you could just make a single table for all the entities with PersonType or Role attribute added to distinguish each person's role. However, if the entities has a lot of different attributes then it would be better that you create separate tables otherwise you will have normalization problem.

Yes that is a very bad way of structuring a DB. The DB structure should be designed based on the Normalizations.
Please check the normalization forms.
U should avoid the duplicate data as much as possible, else the queries will become slower.
And the main problem is when u r trying to get data that is associated with more than one or two tables.

Database Normalization Vocabulary

There is lot or material on database normalization available on Steve's Class and the Web. However, I still seem to lack on very definite reasons on explaining normalization.
For example, for a simple design such as a table Item with a Type field, it makes sense to have the Type as a separate table. The reason I forwarded for that was if in future any need arose to add properties to the Type, it would be much easier with a separate table already existing.
Are there more reasons which can be shown to be obvious?

Check these out too:
An Introduction to Database Normalization
A Simple Guide to Five Normal Forms
in Relational Database Theory

This article says it better than I can:
There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating redundancy and inconsistent dependency.
Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.
What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access because the path to find the data may be missing or broken.
following links can be useful:
http://support.microsoft.com/kb/283878
http://neerajtripathi.wordpress.com/2010/01/12/normalization-of-data-base/

Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization. In his own words:
To free the collection of relations from undesirable insertion, update and deletion dependencies;
To reduce the need for restructuring the collection of relations as new types of data are introduced, and thus increase the life span of application programs;
To make the relational model more informative to users;
To make the collection of relations neutral to the query statistics, where these statistics are liable to change as time goes by.
— E.F. Codd, "Further Normalization of the Data Base Relational Model"
Taken word-for-word from Wikipedia:Database normalization

Is it a bad idea to make a generic link table?

Imagine a meta database with a high degree of normalization. It would blow up this input field if I would attempt to describe it here. But imagine, every relationship through the entire database, through all tables, go through one single table called link. It has got these fields: master_class_id, master_attr_id, master_obj_id, class_id2, obj_id2. This table can easily represent all kinds of relationships: 1:1, 1:n, m:n, self:self.
I see the problem that this table is going to get HUUUUGE. Is that bad practice?

That is wrong on two accounts:
It'll be a tremendous bottleneck for all your queries and it'll kill any chance of throughput.
It reeks of bad design: you should be able to describe things more concisely and closer to reality. If this is really the best way to store the data you can consider partitioning or even another paradigm instead of the relational

In a word, yes, this is a bad idea
Without going into too many details, I would offer the following:
for a meta database, the link table should be split by (high level) entity : that is, you should have a separate link table for each entity
another link table is required for the between-entities links
Normally the high-level entities are fairly easy to identify, like Customer.

It is usually bad practice but not because the table is huge. The problem is that you are mixing unrelated data in one table.
The reason to keep the links in separate tables, is because you won't need to use them together.
It is a common mistake that is also done with data itself: You should not mix two sets of data in one table only because the fields are similar if the data itself is unrelated.

Relational databases don't actually fit for this model.
It's possible to implement it but it will be quite slow. The main drawback is that you won't be able to index the links efficiently.
However, this design can be useful in two cases:
This only stores the metadata: declared relationships between the entities. The actual data are stored in the plain relational tables, so this links are only used to show the structure but not in the actual queries.
This stores some structures which are complex but contain few data, so that the ease of development overweights the performance drawbacks.
This design can be seen in several ORMs (one of which I even developed).

I don't see the purpose of this type of table anyway. If you have table A that is one-to-many to table B then A is going to still have a PK and B will still have a PK. A would normally contain a FK to B.
So in the Master_Table you will have to store A PK, B FK which is just a duplicate of what is already there. The only thing you will 'lose' is the FK in table A but you just migrated it into a giant table that is hard to deal with by the database, the dba, and anyone coding using the db.
Those table appear in Access most frequently and show up on the DailyWTF because they are insanely hard to read and understand.
Oh! And a main problem is that to make the table ubiquitous you will have to make generic columns which will probably end up destroying data integrity.