A company is trying to build a system that breaks down consumer goods (soft drinks, detergents, beauty products, etc.) down to the very basic components. The aim is to be able to break down all the characteristics of a product into as many enumerable quantities as possible. For instance, a soft drink will have the properties flavor, calories, color, cost, etc. Do note that the products will come from a huge variety of segments and not all properties will be applicable to all products (detergents don't have calories) and similarly sounding properties are not similar (detergents with a lime fragrance is different from a lime flavored soft drink). Also, search is expected to be fast and the database needs to understand relationships between products. Suggest only a data model for the same.
The feature you highlight, that not all properties describe all products, is a classic feature of a class/subclass situation. Or, if you prefer, type/subtype.
Dealing with just that feature of the problem, I'm going to call your attention to the EER (Extended Entity Relationship) model if you want to model your understanding of the subject matter. The EER has a way of depicting what it calls a generalization/specialization pattern. That's a good search term to find detailed descriptions of it. This will adequately depict what you've said you're after.
A word of caution, however. The majority of ER models you'll see here in SO are design models, not conceptual models. That is, they reflect the intent of designing tables made up of columns and rows, with keys and foreign keys, to contain the relevant data.
What I'm recommending is the EER model for a very different purpose. It's to depict the way the data looks to the subject matter expert, not the way the data looks to the database designer. That distinction is lost on those who have never learned the difference between analysis and design.
If your project is a major one, it's worth spending an appropriate amount of time on a detailed analysis of the subject matter before moving on to design. Understanding the problem before you try to solve it is key to successful work on big projects.
Once you have a good conceptual model that captures the analysis, the choice of a data model to reflect the design will depend on what kind of database you've decided to build. It might be relational, it might be multidimensional, it might be unstructured. It depends. The analysis, however, will be more useful if it's implementation independent.
Related
I'm trying to refactor some parts of a legacy database schema and am having trouble with coming up with the correct design.
The entities in question are:
samples, papers, studies
papers are associated with many samples
studies are associated with many samples
papers and studies have their own attributes not compatible with each other
samples can be associated with multiple papers and multiple studies
However, this separates out the grouping of papers and studies.
Here's how it looks:
An alternative I thought of was since both papers and studies are just grouping the samples together, I can combine these as one, and have FK from the group into their respective paper/study table.
Here's how it looks:
I'd like to know if the designs look reasonable and if there are any tradeoffs between the two different designs? Also are there alternatives to modelling the relations?
I think the first design is a right one. There are two M:M relations, Paper - Sample and Study - Sample. They are different by domain logic, so there is no sense to combine them in one relation and introduce extra entities for that purpose. First schema is a good normalized one. What is your goal? What problems do you try to resolve?
the schema doesn't have explicit grouping ...
OK, if you do require Group as a separate entity, your design could look like this:
The problem is, Group entity is weak. It is hard to propose any attribute to this entity except for ID. It is not handy to work with this scheme thought. When user edits paper's group, you have to choose, how to handle this situation. Should all other papers\studies 'see' this change too, or you have to create\search edited group and assign it to paper. I think it is wrong way to take if there is no additional business logic related to groups. Usually, when weak entities appear in a design, it means that set of abstractions has been chosen not properly. At the moment, I don't see how to justify Group entity.
I'm doing a work about database, and now I need to show three different images, one image with the conceptual model, other with logical model and other with physical model of a database.
But Im here with some difficults to understand which image represents each model.
I'm looking for reliable information about this, but I find different answers and I'm a bit confused.
So I came here to see if you can help me.
I have below my three images, do you think I have the correct title for each image?
Conceptual model:
In conceptual model, I think that I neeed to put my tables with atributes but without relationships.
Logical Model:
In logical model, I think I need to put my tables with atributes, but now with my relationships.
Physical Model:
In physical model, I think I need to put my tables with atributes, but now with my relationships and also with foreign keys
A Conceptual Model (CM) is an informal representation of the business represented in a manner that is understood by users. It will consist of classes of entities with attributes and the business rules regarding these. It is often presented as Entity-Relationship Diagrams.
A Logical Model (LM) formalizes the CM into data structures and integrity constraints. it should include all the data structures and integrity constraints for the data (this is all constraints, not just that subset of constraints that are easily defined in most available database management systems). It is database management system agnostic.
The LM may be presented as a Relational Data Model (RDM). In which case all the data structures and integrity constraints will be formally represented only using mathematical relations.
A Physical Model (PM) is a representation of the LM on specific hardware and database management system. It may consist of information such as storage sizing and placement; access methods such as indexing; and distribution such as clustering or partitioning.
Using these definitions I would say that all you diagrams are versions of Conceptual Models; as they do not include all the integrity constraints for the data being managed and do not include any information regarding an implementation on specific hardware or database management system.
The conceptual/logical/physical layers have changed somewhat over the years, and also vary according to different schools of thought. The way I learned it, back in the 1980s was this:
The conceptual model summarizes the semantics of the data with reference to the subject matter. It is not bound to a relational implementation. The implementation could be in some sort of prerelational database, or even in classical files of records. You have entities, relationships, attributes, and domains. You also have business rules. That's about it. Like your summary, it's primarily for communication with users and other stakeholders. The idea is to pin down the requirements during the analysis phase.
The logical model is a preliminary design. It's bound to the relational model, but not to a specific DBMS. You have relations, tuples, attributes, and constraints. Relationships are implemented as foreign keys, sometimes requiring junction relations. I tended to use the terminology of tables, rows, and columns, instead of relations, tuples, and attributes, but that's mostly nomenclature. Normalization is relevant here.
The physical model is a detailed design. It's DBMS specific, and takes into account data volume, expected traffic, and performance. Denormalization is relevant here. This leads directly to a creation script.
This is by no means the majority view, let alone a general consensus. You need to understand your audience to see if this framewok works.
Is it a homework or what? The question seems so artificial...
The 3rd one is Physical because the data types are closer to actual DBMS data types.
Between the 1st, and 2nd ones... I'm stuck. The only difference is the crow-feet relationship. If there's a progression between the three images, I'd guess this would make the 2nd one the Conceptual.
But it is difficult because, with PowerDesigner, you could still represent the relationship with crow-feet in the Logical model. But anyway, there should be evidence of the migration of the "foreign key" attribute id_cat in the News entity, which is missing here.
Nope. I was reading my example diagrams too fast, there's no migration in Logical model.
So, just by elimination, I'd make the 1st one the Logical.
I am trying to design a Person database. The requirement is that a Person can have one or more varying number of children, cars, jobs, and homes.
So, currently, the way I have designed this is:
Person {
CharField name
DateField dob
CharField city
...
# Some standard base person data
}
Since I want to support variable number of associations, I create separate tables with one-to-many relationships. For example, I have
Home {
ForeignKey Person
CharField home_address
...
}
Job {
ForeignKey Person
CharField company_nme
CharField office_address
...
}
And so on for other fields.
This works fine because I can have as many or as few entries per person.
The downside is that for each Person, I do lookup on 5-6 tables. I am going to need more fields, so the lookups will increase.
Is there a paradigm to efficiently design this kind of scenario?
If it is of interest, I use Django with PostGreSql.
Edit:
The server is mostly making REST API responses off the database. The browser client needs the entire data for one Person at one go (to reduce multiple requests over network). So I will have to do the multiple joins together.
Actually, for my Person table, I really do not need any relational-stuff. Other tables in my DB are heavily relational. The reason I am thinking of this now is because I suspect that the lot of joins will result in slower performance, and changing the design later will be difficult.
I also came across JSONField for PostGreSql and I was wondering whether I should use those to save the "hanging-off" data so that the REST calls do not result in a multitude of JOINS. Since this is design level, I am thinking of the issue now because I am not sure changing this going ahead will be feasible.
Thanks a lot for your inputs.
Your design is correct. The number of tables is a reflection of the complexity (or not) of the application.
The "paradigm to efficiently design this kind of scenario" is the relational model and you are designing in terms of tables because you are working within that paradigm.
Your notions about "the downside" and "lookups" and "efficiency" presume implementation aspects without justification. The DBMS takes your declarations and updates and answers your queries and hides how. Implementation issues do arise, but far from the level of experience and knowledge suggested by your question.
Just make a staightforward design.
I am about to deign my first E-Commerce Database.
What i have find out in most E-Commerce websites is that these sites have Category, then SubCategory and then again SubCategory and so on. And the depth of SubCategory is not fixed means One Category have six nested Sub Category while some other have different
Now All the products have attributes associated with it.
Now my question is are these websites keep on adding tables for nested sub categories and keep on adding columns for the attributes in the database
OR
They apply something called as "EAV" model (if i am right) to solve this problem or they keep on adding columns and or tables and also keep on updated the WebPages as on many sites i have found there is now a new category.
(If they use EAV model then the website performance is impacted isnt it..)
Since this is my first ECommerce project please provide some valuable suggestions of yours.
Thanks,
Any help is appreciated.
What you need is a combination of EAV for product features and nested sets for product categories.
While I certainly agree that EAV is almost always a bad choice, one application where EAV is the perfect choice is for handling product attributes in an online catalog.
Think about how websites show product attributes... The attributes of products are always shown as a vertical list with two columns: "Attribute" | "Value". Sometimes these lists show side-by-side comparisons of multiple products. EAV works perfectly for doing this kind of thing. The things that make EAV meaningless and inefficient for most applications are exactly what makes EAV meaningful and efficient for product attributes in an online catalog.
One of the reasons why everyone always says "EAV is EVIL!" is that the attributes in EAV are "meaningless" insofar as the column name (i.e. meaning of the attribute) is table-driven and is therefore not defined by the schema. The whole point of schemas is to give your model meaning so this point is well taken. However in the case of an online product catalog, the meaning of product attributes is really unimportant to the system, itself. The only reason your catalog system cares about product attributes is to dump them in a list or possibly in a product comparison matrix. Therefore EAV is doesn't happen to be evil in this particular case.
For product categories, you want a nested set model, as I described in the answer to this question. Nested sets give you very quick retrieval along with the ability to traverse multiple levels of an unbalanced hierarchy at the expense of some precalculation effort at edit time.
This is probably a simple problem for an experienced database developer, but I'm struggling... I have trouble translating a certain ER diagram to a DB model, any help is appreciated.
I have a setup similar to slide 17 of this presentation:
http://www.cbe.wwu.edu/misclasses/mis421s04/presentations/supersubtype.ppt
Slide 17 shows an ER diagram with an Employee supertype having an Employee Type attribute and as subtypes the Employee Types themselves (Hourly, Salaried and Consultant), which is very similar to my design situation.
In my case, suppose Salaried Employees are the only ones that can be bosses of other employees and I wanted to somehow indicate if a certain Salaried employee is the boss of the Hourly and/or Salaried Employee and/or Consultant (either, none or both), how could that be designed in a database model, also considering these are one-to-many relationships?
I can put a PK-FK relationship between them, which would result in all tables having two FKeys and (like Consultant having FK_Employee and FK_SalariedEmployee) and SalariedEmployee referencing itself, but I keep thinking that might not be the wisest solution....although I'm not sure why (integrity issues?).
Is this or an acceptable solution or is there a better one?
Thanks in advance for any help!
Your case looks like an instance of the design pattern known as “Generalization Specialization” (Gen-Spec for short). The gen-spec pattern is familiar to object oriented programmers. It’s covered in tutorials when teaching about inheritance and subclasses.
The design of SQL tables that implement the gen-spec pattern can be a little tricky. Database design tutorials often gloss over this topic. But it comes up again and again in practice.
If you search the web on “generalization specialization relational modeling” you’ll find several useful articles that teach you how to do this. You’ll also be pointed to several times this topic has come up before in this forum.
The articles generally show you how to design a single table to capture all the generalized data and one specialized table for each subclass that will contain all the data specific to that subclass. The interesting part involves the primary key for the subclass tables. You won’t use the autonumber feature of the DBMS to populate the sub class primary key. Instead, you’ll program the application to propagate the primary key value obtained for the generalized table to the appropriate subclass table.
This creates a two way association between the generalized data and the specialized data. A simple view for each specialized subclass will collect generalized and specialized data together. It’s easy once you get the hang of it, and it performs fairly well.
In your specific case, declaring the "boss of" FK to reference the PK in the Salaried Employees table will be enough to do the trick. This will produce the two way association you want, and also prevent employees who are not salaried from being referenced as bosses.