I am trying to model a concept using object-role-modelling, and I can't find the necessary constraint type. I'm wondering if it exists.
Here are three facts:
Commodity must be of one CommodityCategory
EntityDescriptor must be of of CommodityCategory
EntityDescriptor may be for one Commodity
This is straightforward to model:
But here's the constraint:
If an EntityDescriptor is for a Commodity, the CommodityCategory referenced by the Commodity must equal the CommodityCategory referenced by the EntityDescriptor
For example, suppose we had these commodities.
*--------------------*------------*
| CommodityCategory | Commodity |
*--------------------*------------*
| Fuel | Gas |
| Fuel | Petrol |
| Food | Sugar |
*--------------------*------------*
These are legal
*------------------*-------------------*-----------*
| EntityDescriptor | CommodityCategory | Commodity |
*------------------*-------------------*-----------*
| 1 | Fuel | |
| 2 | Fuel | Gas |
| 3 | Food | |
| 4 | Food | Sugar |
*------------------*-------------------*-----------*
But this is illegal
*------------------*-------------------*-----------*
| EntityDescriptor | CommodityCategory | Commodity |
*------------------*-------------------*-----------*
| 5 | Food | Petrol |
*------------------*-------------------*-----------*
I looked at the Equality constraint, but that is about the existence of the relationship, not the actual values in the relationship.
Is there something I can use to model this constraint?
Written in CQL see the ActiveFacts home page, you need a subset constraint like this:
some EntityDescriptor references some Commodity
only if that EntityDescriptor is for some CommodityCategory and that Commodity is of that CommodityCategory;
Note that this becomes more fluent if you include a reading in each direction.
In NORMA, you need a subset constraint that has two role pairs:
The subset pair is the two roles of "EntityDescriptor references Commodity".
The superset pair is the role of EntityDescriptor in "EntityDescriptor is for CommodityCategory,
and the role of Commodity in "Commodity is of CommodityCategory".
Note that the first role of each pair is played by the same type (EntityDescriptor),
and likewise with the second role of each pair (Commodity). It's also possible to use compatible subtype/supertypes, but the types must be compatible in this way.
An equality constraint is like two subset constraints, one running in each direction. It always requires that at least one reference exists whenever an EntityDescriptor is for some CommodityCategory and that Commodity is of that CommodityCategory, as well as vice versa.
can't we add a subset constraint between roles of is of and is for, so every commodity to category is a subset of entity descriptor to category
the table go like this : ED (EntityDescriptor), CC (CommodityCategory), CM (Commodity)
ED CC <---> CC CM ED <---> CM CC
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 5 5 // error, cause CC doesn't have 5,5 to ED
4 4 4 4 4 4 4
5 4 4 5 4 5 4 // ok, cause CC have 4 to 5 on CC-ED
6 4 6 3 // error, cause ED-CC doesn't have 6,3
so can just we see that CC has two roles that to ED (r1) and to CM (r2), that r2 is subset of r1. so i think the commodity doesn't have directly constraint role to ED but the constraint applied through CC.
If you want to enforce using a database I'd recommend before insert/update triggers to prevent associating an EntityDescription and Commodity that do not match.
If you are thinking of using code I'd recommend studying the Specification Pattern. Assume a Commodity, EntityDescriptor and CommodityDescriptor classes. CommodityDescriptor would be part of the composition of the other two classes. Commodity would including a Specification, say MatchingCommidityDescriptionSpecification (yes it's verbose) as part of its composition. Then when Commodity.setEntityDescription(EntityDescription entityDescriptor) is called it validates against the Specification by comparing the Commodity and EntityDescriptor's CommodityDescriptor values.
Related
I have two source tables, one is basically an invoice, the other is a migrated invoice. The same object should probably have been used for both, but I have this instead. They contain most of the same data.
I had thought to combine both into a dimension table, however both will use the same natural keys. How should I approach this?
One potential solution I thought of was using negative numbers for the migrated table, but then the natural keys won't align exactly with the source.
Do I just combine them in the fact table? Then I can't link back to the dimension table for either due to NULLs.
Or do I add an additional column or information to indicate which type of invoice it is?
EDIT
Simple models of the current tables below.
The dimension currently only contains the non migrated data, it has a primary key, however
if i merge the migrated invoice table in to this, it will appear as if the changes are being
made to the original invoices and not a second set of invoices
Dimension
surrogate_key| source_pk | Total | scd_from | scd_to
| | | |
1 | 1 | 100 | 01/01/2019 | 31/01/2019
2 | 1 | 150 | 01/02/2019 | 31/12/2019
3 | 2 | 50 | 01/01/2019 | 31/12/9999
source invoice table
pk | Total
___________________
1 | 150
2 | 50
source migrated invoice table
pk | total
___________________
1 | 200
2 | 300
If invoice and migrated invoice have same natural key but some of the fields have different values (your example shows Total amount different between them), then you have one row based on the natural key in the Dim but 2 different columns to represent the 2 sources. Based on your example, you need invoice_Total and migrated_invoice_Total columns in your DIM.
I have a super-class/subclass hierarchical relationship as follows:
Super-class: IT Specialist
Sub-classes: Databases, Java, UNIX, PHP
Given that each instance of a super-class may not be a member of a subclass and a super-class instance may be a member of two or more sub-classes, how would I go about implementing this system?
I haven't been given any attributes to assign to the entities so I find this very vague and I'm at a loss where to start.
To get started, you would have one table that contains all of your super-classes (in your example case, there would only be IT Specialist, but it could also contain things like Networking Specialist, or Digital Specialist). I've included these to give a bit more flavour:
ID | Name |
-----------------------------
1 | IT Specialist |
2 | Networking Specialist |
3 | Digital Specialist |
You also would have another table that contains all of your sub-classes:
ID | Name |
--------------------
1 | Databases |
2 | Java |
3 | UNIX |
4 | PHP |
For example, let's say that a Networking Specialist needs to know about Databases, and a Digital Specialist needs to know about both Java and PHP. An IT Specialist would need to know all four fields listed above.
There are two possible ways to go about this. One such way would be to set 'flags' in the sub-class table:
ID | Name | Is_IT | Is_Networking | Is_Digital
----------------------------------------------------
1 | Databases | 1 | 1 | 0
2 | Java | 1 | 0 | 1
3 | UNIX | 1 | 0 | 0
4 | PHP | 1 | 0 | 1
Keep in mind, this is only using a small number of skills. If you started to have a lot of super-classes, the columns in the sub-class table could get out of hand pretty quickly.
Fortunately, you can also use something known as a bridging table (also known as an associative entity). Essentially, a bridging table allows you to have two foreign keys that are primary keys in another table, solving the problem of a many-to-many relationship.
You would set this up by having a new table that associates which sub-classes belong with which super-classes:
ID | Sub-class ID | Super-class ID |
-------------------------------------
1 | 1 | 1 |
2 | 1 | 2 |
3 | 2 | 1 |
4 | 2 | 3 |
5 | 3 | 1 |
6 | 4 | 1 |
7 | 4 | 3 |
Note that there are 'duplicates' in both the sub-class ID and super-class ID fields, yet no duplicates in the ID field. This is because the bridging table has unique IDs, which it uses to make independent associations. Sub-class 1 (Databases) needs to be associated to two different groups (IT Specialist and Networking Specialist). Thus, two different associations need to be formed.
Both approaches above give the same 'result'. The only real difference here is that a bridging table will give you more rows, while setting multiple flags will give you more columns. Obviously, the way in which you craft your query will be different as well.
Which of the two approaches you choose to go with really depends on how much data you're dealing with, and how much scope the database is going to have for expansion in the future :)
Hope this helps! :)
Fully optional one to one relation in MySQL workbench?
I'm only able to create a partially optional one to one relation.
My case is:
A GROUP can be assigned a PROBLEM
A PROBLEM can be assigned to a GROUP
EDIT1:
EDIT2: Maybe a better question would be if fully optional one to one relations should be avoided?
Let's see if any of this addresses your issue.
A GROUP can be assigned a PROBLEM
A PROBLEM can be assigned to a GROUP
Starting with a structure such as:
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title
1 | Group1
2 | Group2
What is also important is to know whether a GROUP can be assigned more than one problem at a time or not. And whether one same problem can be assigned to more than one GROUP.
Let's say there is a strict optional 1:1 relationship. This means a group cannot have 2 problems assigned at the same time and that 1 same problem cannot be assigned to 2 groups.
A strict 1:1 would be implemented by adding the PK of table A as a FK of table B. If the FK is nullable then you will notice this is already an optional 1:1, as you may leave empty cells indicating 0 problems assigned (or 0 groups assigned).
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title | problem
1 | Group1 | 2
2 | Group2 | null
In this example Group2 has been assigned no problem. Group1 has been assigned Prob2 and Prob1 has been assigned to no group.
You are not forced to assign anything but everything may have a 1:1 relationship.
This structure may imply quite a few empty (null) values. This is not best practices but would do the job. If you want to avoid null values then you may have to go for a N:M implementation.
PROBLEM
id | title
1 | Prob1
2 | Prob2
GROUP
id | title
1 | Group1
2 | Group2
GROUP_PROBLEM
group | problem
1 | 2
With this implementation alone you may have 1 group be assigned more than 1 problem and have 1 same problem be assigned to more than 1 group. But if you define a UNIQUE index for each of the two fields (group and problem) then you should fix this.
TLDR:
What are the pros and cons of each of these database configurations in terms of performance, ease of management and any other considerations you might think of?
If there are only two types of 'loc'ations and only two types of entities, how would each of these three table configurations be better or worse than one another:
Config 1:
loc | allow | entity | entity_type | loc_type
2 | True | jim | user | zone
29 | False | officer| class | quadrant
Config 2:
zone | allow | entity | entity_type
2 | True | jim | user
&
quadrant | allow | entity | entity_type
29 | False | officer| class
Config 3:
zone | allow | user
2 | True | jim
&
zone | allow | class
&
quadrant | allow | user
&
quadrant | allow | class
29 | False | officer
The first example combines all the data into one table (lets call it "permissions") and has extra columns defining the type of data in the loc and entity columns. The last example breaks it all up into 4 tables, which might be named "zone_user_permissions", "zone_class_permissions", "quadrant_user_permissions" and "quadrant_class_permissions".
The middle one is a bit of a compromise between the first and third.
Obvious differences are that the first single table configuration bears extra columns and thus stores more "extra" data than the last configuration. Where as the third one is spread out over multiple tables and might be more difficult to query in some circumstances.
So assuming there would be approximately an equal distribution of rows among all tables, what would the benefits and downsides be to each of these configurations?
I found it vague when i'm trying to look for the definition of 1NF in google.
Some of the sites like this one, says the table is in 1st normal form when it doesn't have any repetitive set of columns.
Some others (most of them) says there shouldn't be multiple values of the same domain exist in the same column.
and some of them says, all tables should have a primary key but some others doesn't talk about primary key at all !
can someone explain this for me ?
A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets.
From E. F. Codd (Oct 1972). "Further normalization of the database relational model"
This really gets down to what it is about, but the guy who invented the relational database model.
When something is in the first normal form, there are no columns which themselves contain sets of data.
The wikipedia article on first normal form demonstrates this with a denormalized table:
Example1:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659, 555-776-4100
789 | Maria | Fernandez | 555-808-9633
This table is denormalized because Jane has a telephone number that is a set. Writing the table thus is still in violation of 1NF.
Example2:
Customer
Customer ID | First Name | Surname | Telephone Number
123 | Robert | Ingram | 555-861-2025
456 | Jane | Wright | 555-403-1659
456 | Jane | Wright | 555-776-4100
789 | Maria | Fernandez | 555-808-9633
The proper way to normalize the table is to break it out into two tables.
Example3:
Customer
Customer ID | First Name | Surname
123 | Robert | Ingram
456 | Jane | Wright
789 | Maria | Fernandez
Phone
Customer ID | Telephone Number
123 | 555-861-2025
456 | 555-403-1659
456 | 555-776-4100
789 | 555-808-9633
Another way of looking at 1NF is as defined by Chris Date (from Wikipedia):
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Example2 lacks a unique key which is in violation of rule 3. Example1 violates rule 4 in that the telephone number contains multiple values.
Only Example3 fills all those requirements.
Further reading:
Simple Guide to Five Normal Forms in Relational Database Theory
The simplest explanation I have found is this modified definition copied from here:
1st Normal Form Definition
A database is in first normal form if it satisfies the following conditions:
1) Contains only atomic values
2) There are no repeating groups