I have this problem, a concert house where customers book seats (A1, A2, n). Let's say I have this relations:
concertId (foreign key)
Now how would you ensure referencial integrity here? In above example the application will have to update 2 tables for each booking operation. Furthermore, what if I have 5 concert halls instead of one?


How to calculate Jaccard similarity coefficient with sqlite

I have a database made with sqlite3 where each user has 3 possible hobbies, which are saved as a boolean value (1 if the user likes it, 0 if he doesn't).
I want to get a list of the pairs that are similar ordered by their Jaccard similarity coefficient, which means I have to count the number of hobbies that are true for both of them and divide it by the number of hobbies that either of them chose.
I have created this VIEW
All of the pairs must contain wonka in the view. Carros, tecnologia and comida are hobbies.
Instead of trying to store all hobbies in a single row per user, and joining them (Like your view appears to be doing), and then trying to add them up, it's a lot easier to calculate with a better database design that expresses the relationships between users and hobbies by tracking them in another table (Think about what needs to be done to add a fourth hobby.). You'll want to look up terms like many-to-many relationship and junction table for more, and/or find a good resource on database design.
With a design like that, given these tables:
, liked INTEGER
you can calculate the similarity coefficient for all pairs with something like:
SELECT u1.userName AS "Person 1", u2.UserName AS "Person 2"
, ifnull(total(i1.liked AND i2.liked) / total(i1.liked OR i2.liked), 0.0) AS Similarity
FROM users AS u1
JOIN users AS u2 ON u1.userId <> u2.userId
LEFT JOIN interests AS i1 ON u1.userId = i1.userId
LEFT JOIN interests AS i2 ON u2.userId = i2.userId AND i1.hobbyId = i2.hobbyID
GROUP BY u1.userId, u2.userId;

How to normalize the schema to BCNF

I am having some issues with normalization. I have a schema REPAYMENT which looks like this:
Now, from what I've gathered the functional dependencies that hold in the schema is
{borrower_id} --> {name, address, request_date, loan_amount}
{request_date} --> {repayment_date, loan_amount}
{loan_amount] --> {repayment_amount}
(correct me if I'm wrong?)
I'm supposed to normalise the schema to BCNF, but I'm a bit confused. Is the candidate key request_date and borrower_id?
It can be used to register information on the re- payments on micro loans. A borrower, his name and address, are identified with an unique borrower_id. Borrowers can have multiple loans at the same time, but each of those loans ( specified by loan_amount, repayment_date and repayment_amount) have a different re- quest date. Thus a loan can be identified with the borrower ID and the request date of the loan. The borrower can repay multiple (different) loans on the same date, but each loan can only be repaid once (on one date with one amount). There is a system which for each request date and amount of a loan determines the repayment date and amount to be repaid. The loan amount requested and the repaid amount are not the same since there is an interest rate that applies.
From the definition of candidate key:
In the relational model of databases, a candidate key of a relation is
a minimal superkey for that relation; that is, a set of attributes
such that:
The relation does not have two distinct tuples (i.e. rows or records in common database language) with the same values for these
attributes (which means that the set of attributes is a superkey)
There is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
Now your question :
Is the candidate key request_date and borrower_id?
It is a superkey, but not minimal one. Here's how we compute the candidate key.
Which attribute occurs only on the left side, considering all the F . D's ?
ITS borrower_id.This means that it must be a part of every key of this given schema. Now let us compute its closure.
Because of {borrower_id} --> {name, address, request_date, loan_amount}:
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount.
Because of {request_date} --> {repayment_date, loan_amount} and closure(borrower_id) has request_date, this means
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount, repayment_date
And finally because of {loan_amount] --> {repayment_amount} and closure(borrower_id) has loan_amount, this means
closure(borrower_id) = borrower_id, name, address, request_date, loan_amount, repayment_date, repayment_amount
Because closure of borrower_id contains all the attributes, borrower_id is a key and since it is minimal, it is indeed the candidate key and the only one.
Now let us decompose the schema into BCNF. The algorithm is:
Given a schema R.
Compute keys for R.
Repeat until all relations are in BCNF.
Pick any R' having a F.D A --> B that violates BCNF.
Decompose R' into R1(A,B) and R2(A,Rest of attributes).
Compute F.D's for R1 and R2.
Compute keys for R1 and R2.
Since {request_date} --> {repayment_date, loan_amount} and request_date is not a key, it violates BCNF so we split schema into two relations:
Clearly R1 is in BCNF. But R2 is NOT in BCNF , because we missed the following F.D. which is:
address --> name
and we know address is not the key, so we split the R2 further as:
Now, clearly both R3 and R4 are in BCNF. Had we not split the R2 further, we end up storing the same combination of address and name for every loan the person takes, which is redundancy.

Is it good practice to assign ranges to userid?

I'm building a database schema for users of my app, and I am thinking of setting the userid value according to user type. So,
buyers: 10001 to 19999
sellers: 20001 to 29999
shippers: 30001 to 39999
Next, I assign unique email addresses to the userid:
Email.......password.......userid password.......10005 ---> this email belong to user 10005 (a buyer) ---> this email belongs to user 20008 (a seller) ---> this email belongs to user 30187 (a shipper)
I then have 3 tables for buyers, sellers, and shippers because each may have different attributes:
buyer_table mother
10005....... John....... Mary
10006 ....... Chris....... Nancy
sellerid....... name....... pet
20008 ....... Adam....... Dog
20018 ....... Tony ....... cat
shipperid....... name....... car
30187....... George....... GMC
30188 ....... Larry ....... Honda
The advantage here is that I have a single login_table for all user types. I do not want to have 3 login tables for each type. Based on the userid value I know what type of user it is. Keeping three tables for each user (buyer_table, seller_table, and shipper_table) is good for making the schema more understandable, in addition to being able to assign different attributes to each user type.
Sounds good? Maybe.
However, I have a problem in that the login_table refers to “userid” while the three user tables each has a different id name for the user, so in the buyer_table I have buyerid as primary key, in the seller_table it is sellerid as primary key, and finally in the shipper_table, the shipperid is the primary key.
How can I link these three primary keys to the login_table? The login_table has userid as a foreign key to one of those three tables, but it is called “userid”, not buyerid, or sellerid, or shipperid!
1) Is it a good idea to classify the userid value according to ranges?
2) If so, how can I resolve the PK-FK issue as described above?
3) Am I off completely?
Having ranges of values for different kinds of similar objects is not bad. If you feel like doing so, you could use sequences wich support value ranges. This way, you could have a buyer sequence wich goes from 0-1000, a seller one from 1001 to 2000 and so on. That would also help you keeping track of the increasing index of the different kinds!

How do I normalize data in a database?

I'm in an intro to database management course and we're learning about normalizing data (1NF, 2NF, 3NF, etc.) and I'm super confused on how to actually go about and do it. I've read up on this, consulted various sites and youtube videos and I still can't seem to get it to click. I am using Microsoft Access 2013 if that's of any help.
This is the data I'm working with.
Edit1: Alright, I think I have the tables set up correctly. But now I'm having trouble actually inputting data to go from one table to the next. Here's my relationship table.
On a very basic level, any repeating values in a table are candidates for normalization. Duplicated data is usually a bad idea. Say you needed to update a patient's surname - you now have to update all the occurrences in this table, and possibly many others throughout the rest of the database. Much better to store each patient's details in one place only.
This is where normalization comes in. Looking down the columns, you can see that there are repeating values for data about dentists, patients and surgeries, so we should normalize towards having tables for each of these entities, as well as the original table that contains appointments, giving you four tables in total.
Extract the entities out into their own tables, and give each row a primary (unique) key - just use an incrementing integer for now. (Edit: as suggested in the comment we could use the natural keys of PatientNo, StaffNo and SurgeryNo instead of creating surrogates.)
Then, instead of each patient's name and number appearing multiple times in the appointments table, we just reference the key of the master record in the Patient table. This is called a foreign key.
Then, do the same for Dentist and Surgery.
You will end up with tables looking something like this:
AppointmentID DentistID PatientID AppointmentTime SurgeryID
1 1 1 12 Aug 03 10:00 1
2 1 2 ... 2
3 2 3 ... 1
4 2 3 ... 1
5 3 2 ... 2
6 3 4 ... 3
DentistID Name StaffNo
1 Tony Smith S1011
2 Helen Pearson S1024
3 Robin Plevin S1032
PatientID Name PatientNo
1 Gillian White P100
2 Jill Bell P105
3 Ian MackKay P108
4 John Walker P110
SurgeryID SurgeryNo
1 S10
2 S15
3 S13
The first step is to data modelling and denormalization is to understand your data. Study it an understand the domain "objects" or tables that exist within your model. That will give you an idea of how to start. Sometimes a single table or query sample is not enough to fully understand the database, but in your case, we can use the sample data and make some assumptions.
Secondly, look for repeated / redundant data. If you see copies of names, there is a good chance that is a candidate for a foreign key. Our assumption tells us that STAFF_NO is a primary key candidate for DENTIST because each unique STAFF_NO correlates to a unique DENTIST_NAME, so I see a good candidate DENTIST table (STAFF_NO, DENTIST_NAME)
Example in some table of SURGERY:
1 1 Fred Sanford
2 1 Fred Sanford
3 3 Lamont Sanford
4 3 Lamont Sanford
Why store these over and over? What happens when Fred says "But my correct name is Fred G Sanford", so you have to update your database. In the current table, you have to update the name is many rows. If you had normalized it, you'd have a single location for the name, in the DENTIST table.
So I can take the unique dentists and store them in DENTIST
create table DENTIST(staff_no integer primary key, dentist_name varchar(100));
-- One possible way to populate our dentist table is to use a distinct query from surgery
insert into DENTIST
select distinct staff_no, dentist_name from surgery;
1 Fred Sanford
3 Lamont Sanford
SURGERY table now points to DENTIST table
1 1
2 1
3 3
4 3
And you can now create a view, VIEW_SURGERY to join the DENTIST_NAME back in to satisfy the needs of typical queries.
select, d.staff_no, d.dentist_name
from surgery s join dentist d
on s.staff_no = d.staff_no -- join here
So now a unique update to DENTIST, by the dentist primary key will update a single row.
update dentist set name = 'Fred G Sanford' where staff_no = 1;
Add query view will show the updated name for N rows:
select * from view_surgery
1 1 Fred G Sanford
2 1 Fred G Sanford
3 3 Lamont Sanford
4 3 Lamont Sanford
In short, you are removing redundancy.
This is just a sample, and one way to do it. Manual normalization like this is not as common when you have modelling tools, but the point is, we can look at data, spot redundancies and factor those redundancies into new tables, and relate those new tables by foreign keys and joins, then build views to represent the original data.

Database relation many to many

alt text
This is basicly my database structure
one product (let say soap) will have many retail selling size
1 liter
4 liters
20 liters
In my "produit" database I will have the soap item (id #1)
In the size database i will have many size availible :
How not to duplicate the product 3 time with a different size... i like to be able to have check box in the product size of all the size available in the database and check if yes or no (boolean)
The answer a got is perfect, but how to have the option like that :
soap [x] 1 liter , [ ] 4 liter , [x] 20 liter
I'm not sure I understand your exact scenario, but to create a many-to-many relationship, you simply create a "relationship table", in which you store id's for the two records you want to link.
ProductID (PK)
RetailerID (PK)
A many-to-many relationship is almost always modeled using an intermediate table. For your example,
The Size table would contain particular sizes (say, 10 liter) and the Product_Size table creates a Product and Size pairing.
You Will need an Intermediary, or "Join" Table
One record for each product-size combination
Based on the answers, here is the database tables layout as proposed, it look complicated to me, but are you sure it is the way to do this, the BEST solution ?
alt text
