I'm trying to refactor some parts of a legacy database schema and am having trouble with coming up with the correct design.
The entities in question are:
samples, papers, studies
papers are associated with many samples
studies are associated with many samples
papers and studies have their own attributes not compatible with each other
samples can be associated with multiple papers and multiple studies
However, this separates out the grouping of papers and studies.
Here's how it looks:
An alternative I thought of was since both papers and studies are just grouping the samples together, I can combine these as one, and have FK from the group into their respective paper/study table.
Here's how it looks:
I'd like to know if the designs look reasonable and if there are any tradeoffs between the two different designs? Also are there alternatives to modelling the relations?

I think the first design is a right one. There are two M:M relations, Paper - Sample and Study - Sample. They are different by domain logic, so there is no sense to combine them in one relation and introduce extra entities for that purpose. First schema is a good normalized one. What is your goal? What problems do you try to resolve?
the schema doesn't have explicit grouping ...
OK, if you do require Group as a separate entity, your design could look like this:
The problem is, Group entity is weak. It is hard to propose any attribute to this entity except for ID. It is not handy to work with this scheme thought. When user edits paper's group, you have to choose, how to handle this situation. Should all other papers\studies 'see' this change too, or you have to create\search edited group and assign it to paper. I think it is wrong way to take if there is no additional business logic related to groups. Usually, when weak entities appear in a design, it means that set of abstractions has been chosen not properly. At the moment, I don't see how to justify Group entity.


Is it necessary to create a new table if I have two tables with common column names? SQL Best practices

I need help to do a good database analysis and learn
I have two tables:
These two tables have column names in common:
last name
level of education
Is it necessary to create a new table with these elements in common and that they are related to the other two tables?
It is not necessary in this case. I try to make a good analysis and apply good practices
These are just suggestions and not an answer.
Kind of very small questions, but very difficult to answer because answers may contradict each while explaining.
That depends on your requirement actually. for example,
How many records that you going to manage inside a table?
if it is comparably small, you can keep both patients and specialists on the same table with a flag to categorize them.
comparably big, you can keep Patient and specialist tables separately with common fields inside each table.
What level of segregation do you expect from your system?
For example in microservices, keeping two different tables is better to isolate each service. But that also depends on the architecture you're going to use.
But separating common fields into a different table and managing those fields are not necessary like what we are doing with OOP concepts. because unwanted DB relations give you an extra burden to your queries.
This is my idea. You can have many from others :)

database normalization first normal form confusion - when should separating tables out

Please consider from an academic view not practical engineering view. This is about 1NF and 1NF only.
Considering the unnormalized form below, The primary key is {trainingDateTime, employeeNumber}, how would you make it to first normal form?
If we separate course, instructor and employee tables out as separate tables, it will automatically become 3NF.
If i split into different rows, it would be something like:
But problem here is obvious - the primary key is no longer valid.
Changing primary key now to {trainingDateTime, employeeNumber, employeeSkill} doesn't seems to be a sensible solution.
Just to make it satisfy 1NF, you need to have seperate rows for the individual teaching skills. But you should be ensuring that the higher normal forms are also satisfied by splitting tables.
So one row should have teaching skill as Advanced PHP and second row as advanced Java and third row as Advanced SQL and so on for the same employee.
Together with your other question
database normalization - merge/combine tables it seems you are looking for an answer to a question you did not ask.
With regards to your comment "in practice i can't imagine anyone would start from a complete unormalized form." I would think your question is more, why do we need those normalization rules in the way they are formulated in order to produce normalization efficiently. Something like that. I guess your real motivation/question plays a role here.
Normalization is typically perceived as a process or a methodology. And there is no harm to this. However the formulation of those normalization rules also allows for a checklist like usage. So you can doublecheck an arbitrary set of tables with an arbitrary size against normalization rules and confirm or reject normalization compliance. So even if you can find probably thousands of examples where any of those normalization rules confirm normalization compliance from the very first natural schema version you could also find thousands of other examples that would fail normalization compliance on those same rules.
In fact trying to squeeze in multiple somehow coupled information in a historically grown collection of MS Excel tables accross several sheets usually is a extraordinary source for conflicting any set of normalization rules. (e.g. render a business case and connect that with planning aspects and ressource planning)...

Data Modeling for consumer goods

A company is trying to build a system that breaks down consumer goods (soft drinks, detergents, beauty products, etc.) down to the very basic components. The aim is to be able to break down all the characteristics of a product into as many enumerable quantities as possible. For instance, a soft drink will have the properties flavor, calories, color, cost, etc. Do note that the products will come from a huge variety of segments and not all properties will be applicable to all products (detergents don't have calories) and similarly sounding properties are not similar (detergents with a lime fragrance is different from a lime flavored soft drink). Also, search is expected to be fast and the database needs to understand relationships between products. Suggest only a data model for the same.
The feature you highlight, that not all properties describe all products, is a classic feature of a class/subclass situation. Or, if you prefer, type/subtype.
Dealing with just that feature of the problem, I'm going to call your attention to the EER (Extended Entity Relationship) model if you want to model your understanding of the subject matter. The EER has a way of depicting what it calls a generalization/specialization pattern. That's a good search term to find detailed descriptions of it. This will adequately depict what you've said you're after.
A word of caution, however. The majority of ER models you'll see here in SO are design models, not conceptual models. That is, they reflect the intent of designing tables made up of columns and rows, with keys and foreign keys, to contain the relevant data.
What I'm recommending is the EER model for a very different purpose. It's to depict the way the data looks to the subject matter expert, not the way the data looks to the database designer. That distinction is lost on those who have never learned the difference between analysis and design.
If your project is a major one, it's worth spending an appropriate amount of time on a detailed analysis of the subject matter before moving on to design. Understanding the problem before you try to solve it is key to successful work on big projects.
Once you have a good conceptual model that captures the analysis, the choice of a data model to reflect the design will depend on what kind of database you've decided to build. It might be relational, it might be multidimensional, it might be unstructured. It depends. The analysis, however, will be more useful if it's implementation independent.

How to model a database structure with repeating fields in every table

I'm in the process of structuring a databasemodel for my new project. For all the entities in my model (which is a cms, and the entities as such f.ex: page, content, menu, template and a bunch of others) they all have in common the same attributes on dates and names.
More specifically each entity contains the following for the dates: IsCreated, IsValidFrom, IsPublished, IsDeleted, IsEdited and IsExpired, and for names: CreatedByNameId, ValidFromByNameId, PublishedByNameId and so on...
I'm going to use EF5 for mapping to objects.
The question is as simple: What is the best way to structure this: Having all the fields in every table (which I am not obliged to...) or to have two separate tables which the other can relate to...?
Thanks in advance /Finn.
First of all - give this a read -
You really need to think about your queries/access paths. There are many tradeoffs between different implementations.
In reply to your example though,
Given the following setup:
Querying by the COMMON attributes is easy but you'll have to work some magic when pulling up the subclasses (unless EF5 does it for you)
If the primary questions you're asking are about specific1 and specific2 then perhaps this isn't the right model. having the COMMON table doesn't really buy you much necessary as it will introduce a join to load any Specific1 object. In this case, i'd probably just have duplicate columns.
This answer is intentionally partial as a full answer is better handled by the numerous articles and blogs already out there. Search for "mapping object hierarchies to databases"

Supertype/subtype db design with subtype cross-link

This is probably a simple problem for an experienced database developer, but I'm struggling... I have trouble translating a certain ER diagram to a DB model, any help is appreciated.
I have a setup similar to slide 17 of this presentation:
Slide 17 shows an ER diagram with an Employee supertype having an Employee Type attribute and as subtypes the Employee Types themselves (Hourly, Salaried and Consultant), which is very similar to my design situation.
In my case, suppose Salaried Employees are the only ones that can be bosses of other employees and I wanted to somehow indicate if a certain Salaried employee is the boss of the Hourly and/or Salaried Employee and/or Consultant (either, none or both), how could that be designed in a database model, also considering these are one-to-many relationships?
I can put a PK-FK relationship between them, which would result in all tables having two FKeys and (like Consultant having FK_Employee and FK_SalariedEmployee) and SalariedEmployee referencing itself, but I keep thinking that might not be the wisest solution....although I'm not sure why (integrity issues?).
Is this or an acceptable solution or is there a better one?
Thanks in advance for any help!
Your case looks like an instance of the design pattern known as “Generalization Specialization” (Gen-Spec for short). The gen-spec pattern is familiar to object oriented programmers. It’s covered in tutorials when teaching about inheritance and subclasses.
The design of SQL tables that implement the gen-spec pattern can be a little tricky. Database design tutorials often gloss over this topic. But it comes up again and again in practice.
If you search the web on “generalization specialization relational modeling” you’ll find several useful articles that teach you how to do this. You’ll also be pointed to several times this topic has come up before in this forum.
The articles generally show you how to design a single table to capture all the generalized data and one specialized table for each subclass that will contain all the data specific to that subclass. The interesting part involves the primary key for the subclass tables. You won’t use the autonumber feature of the DBMS to populate the sub class primary key. Instead, you’ll program the application to propagate the primary key value obtained for the generalized table to the appropriate subclass table.
This creates a two way association between the generalized data and the specialized data. A simple view for each specialized subclass will collect generalized and specialized data together. It’s easy once you get the hang of it, and it performs fairly well.
In your specific case, declaring the "boss of" FK to reference the PK in the Salaried Employees table will be enough to do the trick. This will produce the two way association you want, and also prevent employees who are not salaried from being referenced as bosses.
