Database design, how to display dependency inside a table? - sql-server

My question title might be a little bit misleading, since I don't know how to word it. Sorry about that.
I have a table called course which holds a list of courses with ID and Name columns
ID Name
---- ---------
1 JAVA
2 C#
3 C++
4 HTML
5 PHP
6 JAVASCRIPT
7 HARDWARE
8 PERL
9 CSS
There is a simple app, student ask if he can enroll a particular course, then the system will check he has finish the prerequisites. To do a particular course, you need to finish one or more prerequisites. Here are some silly examples:
To do JAVA, you have to finish HARDWARE and HTML
To do C++, you have to finish HARDWARE and PHP
To do CSS, you have to finish JAVA
How can I show this relationship in database, do I need to add a new column to achieve this?
Thanks a lot for your help.

You need to introduce a second table called CoursePreRequiste. It could have the following columns:
Id
CourseId
PreReqCourseId
sample entries
Id CourseId PreReqCourseId
--- --------- --------------
1 1 4
2 1 7
The CourseId / PreReqCourseId combination has to be defined as unique in the table. You could of course do away with the Id column in the second column but I personally like to use Id in all my columns, it makes updating the table easier.

You need to define the dependency relationship in your table design. Per your description you have the following senarios:
One Course can depend on many
One Course can be a dependency for many
That is a many-to-many relationship, which is best expressed as an extra table containing a Foreign key to each of the relationship sides (in your case both are courses, from the same table). The new table design should be as Raj highlighted in his answer:
Id CourseId PreReqCourseId
--- --------- --------------
1 1 4
2 1 7
3 2 8
4 2 3
All you need later to know which courses a given course depends on is to run :
SELECT PreReqCourseId FROM ThisNewTable WHERE CourseId = #Value
Remember to change #Value with the Id of the course you are looking for or think about using it as a parameter if your script is called for an application (like C# or else)

Related

What database lends itself to a composition relationship between entites?

TLDR: Looking for a free database option to run locally that lends itself to composition. Object A is composed of B,C,D. Where B,C are of the same type as A. What should I use?
I would like to experiment with some open source database. I am playing a game that has crafting and it is a bit cumbersome to drill down into the objects to figure out what resources I need. This seems like a good problem to solve via a database. I was hoping to explore a NoSQL option as I do not have much experience with them.
To use a simple contrived example:
A staff: requires 5 wood
A spearhead: requires 2 iron
A spear: requires 1 staff, 1 spearhead
A trident: requires 1 staff, 3 spearheads, 2 iron
If I wanted to build 2 tridents and 1 spear a query to my database would inform me I need 15 wood, 18 iron.
So each craftable item would require some combination of base resources and/or other crafted items. The question to be put to the database is, given that I already possess some resources, what is remaining for me to collect to build some combination of items?
If I were to attempt this in SQL I would make 4 tables:
A resource table (the raw materials needed for crafting)
An item table (the things I can craft)
A many to many table, mapping items to items
A many to many table, mapping items to resources
What would you recommend I use? An answer might be, there are no NoSQL databases that lend themselves well to your problem set (model and queries).
Using the Bill of Materials picture I linked to in the comment, you have a Resource table.
Resource
--------
Resource ID
Resource Name
Here are some rows, based on your example. I deliberately added spearhead after spear. The order of the resources doesn't matter.
Resource ID | Resource Name
---------------------------
1 Wood
2 Iron
3 Staff
4 Spear
5 Spearhead
6 Trident
Next, you have a ResourceHiearchy table.
ResourceHiearchy
----------------
ResourceHiearchy ID
Resource ID
Parent Resource ID (FK)
Resource Quantity
Here are some rows, again based on your example.
ResourceHiearchy ID | Resource ID | P Resource ID | Resource Quantity
1 6 null null
2 5 6 3
3 3 6 1
4 2 6 2
5 4 3 1
6 4 5 1
7 5 2 2
8 3 1 5
Admittedly, this is difficult to create by hand. I probably made some errors in my example. You would have a part of your application that allows you to create Resource and ResourceHiearchy rows using the actual resource names.
You have to make several queries to retrieve all of the components for a top-level resource, starting with a null Parent ResourceHiearchy ID and querying your way through the resources. That's the disadvantage to a Bill of Materials.
The advantage of a Bill of Materials is there's no limit to the nesting and you can freely combine items and resources to make more items.
You can identify resources and items with a flag in your Resource table if you wish.
You might want to consider a graph data model, such as JanusGraph, where entitles (nodes) could be members of a set (defined as another node) via a relationship (edge).
That would allow you to have multi-child or multi-parent relationships you are talking about.
Mother == married to == Father
child1, child2, child 3 ... childn
Would all then have a "childOf" relationship to both the mother and separately to the father, and would be "siblingOf" the other members of the set, as labeled along their edges.
Make sense?
Here's more of the types of edge labels and multiplicities you can have:
https://docs.janusgraph.org/basics/schema/
Disclosure: I work for ScyllaDB, and our database is often used as a storage engine under JanusGraph implementations. There are many other types of NoSQL graph databases you can check out. Find the one that's right for your use case & data model.
Edit: JanusGraph is open source, as is Scylla:
https://github.com/JanusGraph
https://github.com/scylladb/scylla

design table database, duplicated register or create another table?

I want you to know your opnion about this situation:
I have a table named "movie" with this colums
movie_id
name
price
...... etc
A movie can be available to rent, purchased or both.
If I want a movie available to rent and purchase the price change, for example:
Price for rent: $2.50
Price for purchase: $15.45
The question is:
Is better to make a duplicate in the table movie?
movie_id name price available_for ...... ........
1 300 $2.50 rent
2 300 $15.45 purchase
Or make another table adding the info of price and available_for? Like this:
Table Movie
movie_id name ...... .......... ..........
1 300
2 300
Table Movie_available_for
Id movie_id available_for price
1 1 rent $2.50
2 1 purchase $15.45
I want to know which is the best solution for this
Thanks!
Your relational approach might depend on what level of normalization you hope to achieve. Your question reminds me a lot of the Boyce–Codd normal form (BCNF) vs the 3rd normal form (3NF).
In fact, there is an example similar to your question on this wiki page: Boyce–Codd normal form (Wikipedia)
There is a lot of theory here, but it can many times come down to either what you feel the most comfortable with or whichever technique you can perform the most accurately.
Personally, in this specific case, I would go with the slightly more normalized form (your 2nd example). This is because, the "available_for" and "price" are related variables. If you end up adding more info about movies, that info is potentially going to be duplicated many times. If you add a third "availible_for" or different pricing schemes (1 day for $1.50, 5 days for $4), you will have very significant data duplication.
Besides, when it comes to code, it would be nice to have a movie object that has an array of nested "availible_for" (might name this something else like "offering" or something) objects.
I would suggest you normalize your available_for column as it is repeated and contains few fields only.Store that in another table and create a relation between two tables.
Movie_Available_type
id int, available_for varchar(50)
Then you can use either of two as pointed out by thoughtarray in above post.
I would go with:
Movie (movie_id PK, name, purchase_price, rent_price)
and make the pricing columns nullable. If you don't like nulls, you can decompose it into:
Movie (movie_id PK, name)
PurchasePrice (movie_id PK/FK, price)
RentPrice (movie_id PK/FK, price)

Can I set up an integrity constraint to multiple tables on one column?

I have a table which contains a list of components associated with a device.
dev_comps
=========
dev_id comp_type comp_id
------ --------- -------
1 A 1234
1 A 1237
1 A 1238
1 B 5678
1 C 1234
2 A 1235
(yes I can have multiples of each kind of component on the device, otherwise I'd just attach them to the device table instead of having a junction table)
I also have several table for the types of components (which of course all have a little bit different options each other).
comp_a
======
comp_a_id color style
--------- ----- -----
1234 red glass
1235 blue wood
comp_c
======
comp_c_id style length
--------- ----- ------
1234 glass 5
1235 wood 7
Is it possible to set up an integrity constrain between dev_comps.comp_id and comp_a.comp_a_id; and dev_comps.comp_id and comp_c.comp_c_id; etc.?
In general I don't want my app/users to be able to create an entry in dev_comps where I don't have a matching comp_id in the corresponding comp_* table.
Or, is there a better way to do this?
Edit
I've come to the conclusion that what I'm really doing here is Column Overloading. Which as I recall from my old Intro to DBs course is generally considered a poor design. Is there a clean way out of this?
No, you can't have a foreign key constraint that refers to one of two possible parent tables. There are two common approaches to this sort of issue:
You can combine all the comp_* tables into a single table, adding in a comp_type column. Presumably, this would mean that many attributes would be NULL for every row since each type of component only has a small number of the available attributes. You can add constraints that ensure that the appropriate set of columns are NULL or NOT NULL depending on the comp_type but you'll waste a bit of space. On the other hand, storing NULL values, particularly at the end of the row, is pretty cheap.
You can create a master_component table that just has the comp_id and comp_type. The various comp_* tables could have foreign keys that point to that table (though you'd want comp_type to be added at least as a virtual column). And dev_comps could point at the master_component table.
Of course, there are other options depending on the specifics of the problem. If there are a large number of components with lots of different attributes, such that combining tables would produce a master_component table with hundreds of columns, it may make more sense to have a single master_component table with a component_attribute table that stores attributes as rows rather than as columns. But that's more of a specialized case not something I'd start with.

Dynamic Validation: 0 or more per field, how to AND or OR validation rules?

Furthering: Database design for dynamic form field validation
How would I model the database when a particular field can have 0 or more validations rules AND that each validation rule is "related" to another rule via AND or OR.
For example, say I have field1 that needs to be minimum of 5 characters AND maximum 10 characters. These are 2 rules that apply to the same field and are related via an "AND." An example of how rules relate via an "OR" would be something like this: field1 should have exactly 5 characters OR exactly 10 characters.
The validation could get complex and have n-levels of nesting. How do I do this in a DB?
I don't think there's a simple answer to how to model this. The following conversation will hopefully get you started, and give you some sense of the issues involved.
So far as I can see, you have at least three types of entity: fields, simple rules, and complex rules (that is, rules made by combining other simple and/or complex rules).
The one piece of good news is that I'm pretty sure you just need two types of complex rule: an AND rule, and an OR rule, each of which applies a set of sub rules, and returns true or false based on the results returned by those subrules.
So you want to build a structure where each form has 1 or more fields, each field has 0 or more validation rules, and each rule has 0 or more sub-rules.
One challenge is just to keep track of the structure of each complex rule. What strikes me as the simplest way to do this is in a tree structure where each node has a parent. So you might have an OR rule with a parent of 0 (indicating that it's a top-level rule). There would then be 2 or more rules with the OR's ruleId as their parent. In turn, any of those might be an AND or OR rule which would be the parent of other rules. And so on down the tree.
Another challenge is how to extract your structure from the db so you can validate a form. It's preferable to minimize the number of queries it takes to do this. In a straight tree, where the structure is only established by children nodes knowing their parents, you'd need a separate query to get each parent's immediate children. So it'd be nice if you could aggregate all the children together under a single ancestor.
If any rule can only be assigned to 1 field, then you can have a fieldId column in your rules table, and each rule will be assigned to a field. Then you can join a form to its fields, and those fields to their rules, and pull out everything in one query. Then the application logic would be responsible for turning the data into a functional tree structure.
However, if you want rules to be reusable, that's not going to work. For example, you might want an abstract zip code rule which combined several sub rules (rather than being a giant regex). And then you might want to make that a US zip code rule, and make another for Canada, and another for any of multiple countries, and then you might want to combine some or all of those depending on which field was being validated. So you might have, for example a US OR Canada zip rule applied to some fields, a US only rule applied to other fields, etc.
One way to do this is to remove the fieldId field from rules, and add a new field_rules junction table with fieldId and ruleId as its columns. However, removing fieldId from fields puts you back into not having a single-query means of extracting all the rules (including sub rules) for a field, never mind for a form. So you might add an origin column to the rules table, and all the subrules of a complex rule would have that top-level field's id as their origin.
Now things might get even more complex if you want to allow overriding some of a reusable rule's data for specific fields. Then you might add either a new field_rule_data table, or just data columns to the field_rules table.
Implementing a tree structure means that your application logic for both building and applying complex rules is probably going to have to be recursive.
Having said all that, I suspect your real challenge is going to be at the UI level.
Edit
I thought about this some more, and it's seeming even more complicated. I'm sure the following is also inadequate, but I hope it will facilitate figuring out a full answer.
I'm now thinking you have 5 tables: rules, rule_defs, rule_defs_index, fields, field_rules. They go something like this:
Rules
rule_id (PK)
name
data (can be null)
Rule_Defs
rule_def_id (PK)
rule_id (FK to rule_id)
parent (FK to rule_def_id)
origin (FK to rule_def_id: optional convenience field)
Rule_Defs_Index
rule_id (FK)
rule_def_id (FK)
Fields
field_id (PK)
name
Field_Rules
field_id (FK and part of PK)
rule_id (FK and part of PK)
Just making stuff up here in a vaguely plausible way, here's some sample data:
Rules
id name data
1 AND
2 OR
3 5 digits /^\d{5,5}$/
4 5-4 pattern /^\d{5,5}-\d{4,4}$/
5 US Zip
6 6 alphanumerics /^[A-Za-z0-9]{6,6}$/
7 US or Canada Zip
Rule_Defs
id rule_id parent origin
1 5 0 1
2 2 1 1
3 3 2 1
4 4 2 1
5 7 0 5
6 2 5 5
7 5 6 5
8 6 6 5
Rule_Defs_Index (just data for US Canada Zip since that's biggest)
rule_id rule_def_id
7 2
7 3
7 4
7 5
7 6
7 7
Fields
field_id name
1 billing zip
2 shipping zip
Field_Rules
field_id rule_id
1 7
2 7
Note that the assumption here is that it creating and editing rules will happen rarely relative to applying rules. Thus creating and editing will be fairly cumbersome and relatively slow activities. To avoid this being the case for the far more common application of rules, the Rule_Defs_Index should make it possible to extract everything needed to build a rule structure for a field (or a form) with a single query. Of course, once it's retrieved, the application will have to do a fair amount of work to turn the data into a useful structure.
Note that you might want to cache the constructed data in serialized form, rebuilding the cache in the relatively rare instances when a rule is edited or created.

Best way to create a unique number for each many to many relationship

I have a table of Students and a table of Courses that are connected through an intermediate table to create a many-to-many relationship (ie. a student can enroll in multiple courses and a course can have multiple students). The problem is that the client wants a unique student ID per course. For example:
rowid Course Student ID (calculated)
1 A Ben 1
2 A Alex 2
3 A Luis 3
4 B Alex 1
5 B Gail 2
6 B Steve 3
The ID's should be numbered from 1 and a student can have a different ID for different course (Alex for example has ID=2 for course A, but ID=1 for Course B). Once an ID is assigned it is fixed and cannot change. I implemented a solution by ordering on the rowid of the through table "SELECT Student from table WHERE Course=A ORDER BY rowid" and then returning a number based on the order of the results.
The problem with this solution, is that if a student leaves a course (is deleted from the table), the numbers of the other students will change. Can someone recommend a better way? If it matters, I'm using PostgreSQL and Django. Here's what I've thought of:
Creating a column for the ID instead of calculating it. When a new relationship is created assigning an ID based on the max(id)+1 of the students in the course
Adding a column "disabled" and setting it True when a student leaves the course. This would involve changing all my code to make sure that only active students are used
I think the first solution is better, but is there a more "database centric way" where the database can calculate this for me automatically?
If you want to have stable ID's, you certanly need to store them in the table.
You'll need to assign a new sequential ID for every student that joins a course and just delete it if the student leaves, without touching others.
If you have concurrent access to your tables, don't use MAX(id), as two queries can select same MAX(id) before inserting it into the table.
Instead, create a separate table to be used as a sequence, lock each course's row with SELECT FOR UPDATE, then insert the new student's ID and update the row with a new ID in a single transaction, like this:
Courses:
Name NextID
------- ---------
Math 101
Physics 201
Attendants:
Student Course Id
------- ------ ----
Smith Math 99
Jones Math 100
Smith Physics 200
BEGIN TRANSACTION;
SELECT NextID
INTO #NewID
FROM Courses
WHERE Name = 'Math'
FOR UPDATE;
INSERT
INTO Attendants (Student, Course, Id)
VALUES ('Doe', 'Math', #NewID);
UPDATE
Courses
SET NextID = #NewID + 1
WHERE Course = 'Math';
COMMIT;
Your first suggestions seems good: have a last_id field in the course table that you increase by 1 any time you enroll a student in that course.
Creating a column for the ID instead
of calculating it. When a new
relationship is created assigning an
ID based on the max(id)+1 of the
students in the course
That how I'd do it. There is no point of calculating it. And the id's shouldn't change just because someone dropped out.
Adding a column "disabled" and setting
it True when a student leaves the
course.
Yes, that would be a good idea. Another one is creating another table of same structure, where you'll store dropped students. Then of course you'll have to select max(id) from union of these two tables.
I think there are two concepts that you need to help you out here.
Sequences where the database gets the next value for an ID for you automatically
Composite keys where more than one column can be combined to make the primary key of a table.
From a quick google it looks like Django can handle sequences but not composite keys, so you will need to emulate that somehow. However you could equally have two foreign keys and a sequence for the course/student relationship
As for how to handle deletions, it depends on what you need from your app, you may find that a status field would help you as you may want to differentiate between students who left and those that were kicked out, or get statistics on how many students leave different courses.

Resources