Best structure for this database? - database

The Problem: I want to design a website and I need a database for that, however I don't know which structure is better!
What will happen: Users will add some URIs to their favorites.
Possible structures:
Structure one:
TABLE "USERS":
=====================================================================
id | name | last_name | urls
1 | John | Smith | [google.com,stackoverflow.com,yahoo.com,...]
=====================================================================
Structure two:
TABLE "USERS":
=============================================
id | name | last_name
1 | John | Smith
2 | Joe | Roth
==============================================
TABLE "URLS":
==============================================
id | user_id | url
1 | 1 | google.com
2 | 1 | stackoverflow.com
3 | 2 | ask.com
4 | 1 | yahoo.com
5 | 2 | being.com
==============================================
Which structure is best? Thanks in advance!

The second schema (with a foreign key constraint on URLS.user_id) will be easier to manage. A select query will need to use a join and you'll be performing more inserts, but you won't need to perform string parsing to figure out what the urls are (which, with the first schema, you would need for every single select and update).
One of the tables in the production database at my current job has a schema similar to the first case, and it makes my entire department cringe and complain when they have to write code working with it. Yet the table remains, because fixing it would be a massive overhaul. (The table was created by a manager who no longer works here.) If you're creating your schema now, do it right, while you still can.

Absolutely create another table that save urls + user_id. Your first type is not in Normal form.

Related

Ensuring that two column values are related in SQL Server

I'm using Microsoft SQL Server 2017 and was curious about how to constrain a specific relationship. I'm having a bit of trouble articulating so I'd prefer to share through an example.
Consider the following hypothetical database.
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+-----------------------------------+
| Id | CustomerId | AddressId |
+-----------------------------------+
| 1 | 1 | 1 |
| 2 | 2 | 3 |
| 3 | 1 | 3 | <--- Invalid Customer/Address Pair
+-----------------------------------+
Notice that the final Order links a customer to an address that isn't theirs. I'm looking for a way to prevent this.
(You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.)
From the related reading I was able to find, it seems that one method may be to enable a CHECK constraint targeting a User-Defined Function. This User-Defined Function would be something like the following:
WHERE EXISTS (SELECT 1 FROM Addresses WHERE Id = Order.AddressId AND CustomerId = Order.CustomerId)
While I imagine this would work, given the somewhat "generality" of the articles I was able to find, I don't feel entirely confident that this is my best option.
An alternative might be to remove the CustomerId column from the Addresses table entirely, and instead add another table with Id, CustomerId, AddressId. The Order would then reference this Id instead. Again, I don't love the idea of having to channel through an auxiliary table to get a Customer or Address.
Is there a cleaner way to do this? Or am I simply going about this all wrong?
Good question, however at the root it seems you are struggling with creating a foreign key constraint to something that is not a foreign key:
Orders.CustomerId -> Addresses.CustomerId
There is no simple built-in way to do this because it is normally not done. In ideal RDBMS practices you should strive to encapsulate data of specific types in their own tables only. In other words, try to avoid redundant data.
In the example case above the address ownership is redundant in both the address table and the orders table, because of this it is requiring additional checks to keep them synchronized. This can easily get out of hand with bigger datasets.
You mentioned:
However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
But that is why a relational database is relational. It does this so that distinct data can be kept distinct and referenced with relative IDs.
I think the best solution would be to simply drop this requirement.
In other words, just go with:
Customers
+---------------+
| Id | Name |
+---------------+
| 1 | Sam |
| 2 | Jane |
+---------------+
Addresses
+----------------------------------------+
| Id | CustomerId | Address |
+----------------------------------------+
| 1 | 1 | 105 Easy St |
| 2 | 1 | 9 Gale Blvd |
| 3 | 2 | 717 Fourth Ave |
+------+--------------+------------------+
Orders
+--------------------+
| Id | AddressId |
+--------------------+
| 1 | 1 |
| 2 | 3 |
| 3 | 3 | <--- Valid Order/Address Pair
+--------------------+
With that said, to accomplish your purpose exactly, you do have views available for this kind of thing:
create view CustomerOrders
as
select o.Id OrderId,
a.CustomerId,
o.AddressId
from Orders
join Addresses a on a.Id = o.AddressId
I know this is a pretty trivial use-case for a view but I wanted to put in a plug for it because they are often neglected and come in handy with organizing big data sets. Using WITH SCHEMABINDING they can also be indexed for performance.
You may ask why I need the CustomerId in the Orders table at all. To be clear, I recognize that the Address already offers me the same information without the possibility of invalid pairs. However, I'd prefer to have an Order flattened such that I don't have to channel through an address to retrieve a customer.
If you face performance problems, the first thing is to create or amend proper indexes. And DBMS are usually good at join operations (with proper indexes). But yes normalization can sometimes help in performance tuning. But it should be a last resort. And if that route is taken, one should really know what one is doing and be very careful not to damage more at the end of a day, that one has gained. I have doubts, that you're out of options here and really need to go that path. You're likely barking up the wrong tree. Therefore I recommend you take the "normal", "sane" way and just drop customerid in orders and create proper indexes.
But if you really insist, you can try to make (id, customerid) a key in addresses (with a unique constraint) and then create a foreign key based on that.
ALTER TABLE addresses
ADD UNIQUE (id,
customerid);
ALTER TABLE orders
ADD FOREIGN KEY (addressid,
customerid)
REFERENCES addresses
(id,
customerid);

How to update data in a Many-to-Many linking table?

For simplicity’s sake lets assume there’s a Post table and a Tags table (not the actual use case but this will keep it simple)
posts Table
id | title
--------------------------------
1 | Random Text Here
2 | Another Post About Stuff
tags Table
id | tag
--------------------------------
1 | javascript
2 | node
3 | unrelated-thing
posts_tags table
id| post_id | tag_id
--------------------------------
1 | 1 | 1
2 | 1 | 2
3 | 1 | 3
4 | 2 | 2
A Post can have many Tags and a single Tag could be associated with many Posts.
Web App Assumptions Lets pretend adding/removing a Tag doesn't trigger a single aysnchronous action within the web app against the linking table.
Instead the user would edit the Post (adding or removing any tags already created) then hit Save. The web app would submit JSON including an array of Tags ids associated with the Post to the server which would then process the update request in the code.
For example, post_id=1 is submitted with only tag_id=[1,2] so tag=3 needs to be removed as an association in the linking table.
If a Post or a Tag is deleted, I'd have an ON DELETE CASCADE set on
posts_tags.post_id
posts_tags.tag_id
But what is the best way to update the linking table data in the instance of updating the tags associated with a post?

Option 1:
Get all the Post-Tags for the edited Post 
SELECT * FROM posts_tags WHERE post_id = 1
Determine which tags have been added (and INSERT into linking table)
Determine which tags have been removed (and DELETE from linking table)
Option 2:
Delete ALL tags with the post_id in the linking table
Insert all submitted tags into the linking table
Option 3:
Something I'm not thinking about :)
Would Option 2 have a bigger performance impact on indexes as the table grows?
EDIT:
For clarity, the actual Post and Tag data isn't changed or removed. This is purely about Updating a post's associated tags
The database I'm using is PostgreSQL 9.6
Option 2 would be fine from a performance point of view - much better than option 1, because you have a single operation to delete the old associations, and then a bunch on insert statements. In option 1, you have more queries (your first query to retrieve the associations, and then the deletes if applicable).
As long as your table has an index on post_id, then delete * from posts_tags where post_id = ? will be lightning fast, even on a huge table.
There is an alternative...
posts_tags table
id| post_id | tag_id | version_id
--------------------------------
1 | 1 | 1 | 0
2 | 1 | 2 | 0
3 | 1 | 3 | 1
4 | 2 | 2 | 0
5 | 1 | 1 | 2
6 | 1 | 3 | 2
In this case, you use a versioning mechanism to determine the "current" associations (max(version_id)), so you never have to delete anything - you just insert new rows.
In practice, this is probably no faster, but it does save you that "delete" query.

Relationships Between Tables in MS Access

I'm new in DataBases at all and have some difficulties with setting relationships between 3 tables in MS Access 2013.
The idea is that I have a table with accounts info, a table with calls related to this accounts and also one table with all the possible call responses. I tried different combinations between them but nothing works.
1st table - Accounts : AccountID(PK) | AccountName | Language | Country | Email
2nd table - Calls : CallID(PK) | Account | Response | Comment | Date
3rd table - Responses: ResponseID(PK) | Response
When you have a table, it usually has a Primary Key field that is the main index of the table. In order for you to connect it with other tables, you usually do that by setting Foreign Key on the other table.
Let's say you have your Accounts table, and it has AccountID field as Primary Key. This field is unique (meaning no duplicate value for this field).
Now, you have the other table called Calls and you have a Foreign Key field called AccountID there, which points to the Accounts table.
Essentially you have Accounts with the following data:
AccountID| AccountName | Language | Country | Email
1 | FirstName | EN | US | some#email.com
2 | SecondName | EN | US | some#email.com
Now you have the other table Calls with Many calls
CallID(PK) | AccountID(FK) | ResponseID(FK) | Comment | Date
1 | 1 | 1 | a comment | 26/10
2 | 1 | 1 | a comment | 26/10
3 | 2 | 3 | a comment | 26/10
4 | 2 | 3 | a comment | 26/10
You can see the One to Many relationship: One accountID (in my example AccountID=1) to Many Calls (in my example 2 rows with AccountID=1 as foreign keys, rows 1 & 2) and AccountID=2 has also 2 rows of Calls (rows 3 and 4)
Same goes for the Responses table
Using this table structure:
Accounts : AccountID(PK) | AccountName | Language | Country | Email
Calls : CallID(PK) | AccountID(FK) | ResponseID(FK) | Comment | Date
Responses: ResponseID(PK) | Response
Accounts.AccountID is referenced by Calls.AccountID. 1:n – many calls for one account possible, but each call concerns just one account.
Responses.ResponseID is referenced by Calls.ResponseID. 1:n – many calls can get the same response from the prepared set, but each call gets exactly one of them.
To actually define the Relationships in Access, open the Relationships window...
... then follow the detailed instructions here:
How to define relationships between tables in an Access database

One large database table for my "list" or a new table for each "list"

I'm trying to create a database system where users can create lists, and their friends (that they allow list access to) can add to the list.
I'm trying to map out this schema and can't decide between the following:
List Table
With attributes:
listid, entry-number, entry, user-id
Where listid is the list being changed, entry-number is the number of that entry in the list (so the first item in a list is entry 0), entry is the entry on the list, and user-id is the user who added the entry
VS
Specific List Table
where a specific table is made for each list with attributes
entry-number, entry, user-id
It seems like the 2nd option makes it much easier to get information/change a list once we find the table, whereas the first one is much easier to understand.
I'm just getting into databases so I want to pick the schema correctly.
Thanks!
The first option is more maintainable and normalised. It would be easier to query this option and create applications that use the list.
From what I can see, you should have two tables. One that holds user information and one that holds list information.
Users will have userID (PK) and your List table would have listID (PK) and a userID as FK to reference which user the list belongs to. So yeah, your first choice :)
GL
For example:
User
userID(PK) | username | etc
--------------------------
1 | Bob | etc
2 | Nick | etc
Lists
listID(PK) | userID(FK) | date_entered | entry
----------------------------------------------
1 | 2 | 1/2/2011 | blah blah
2 | 2 | 2/1/2011 | blah blah
3 | 1 | 2/3/2011 | blah blah
4 | 2 | 2/6/2011 | blah blah
5 | 1 | 3/1/2011 | blah blah
//you should know the userID for the user you are looking up list info for (C#)
query = "select * from Lists where userID = " + userID.ToString() + " ORDER BY date_entered DESC";

DB Data migration

I have a database table called A and now i have create a new table called B and create some columns of A in table B.
Eg: Suppose following columns in tables
Table A // The one already exists
Id, Country Age Firstname, Middlename, Lastname
Table B // The new table I create
Id Firstname Middlename Lastname
Now the table A will be look like,
Table A // new table A after the modification
Id, Country, Age, Name
In this case it will map with table B..
So my problem is now i need to kind of maintain the reports which were generated before the table modifications and my friend told me you need to have a data migration..so may i know what is data migration and how its work please.
Thank you.
Update
I forgot to address the reporting issue raised by the OP (Thanks Mark Bannister). Here is a stab at how to deal with reporting.
In the beginning (before data migration) a report to generate the name, country and age of users would use the following SQL (more or less):
-- This query orders users by their Lastname
SELECT Lastname, Firstname, Age, Country FROM tableA order by Lastname;
The name related fields are no longer present in tableA post data migration. We will have to perform a join with tableB to get the information. The query now changes to:
SELECT b.Lastname, b.Firstname, a.Country, a.Age FROM tableA a, tableB b
WHERE a.name = b.id ORDER BY b.Lastname;
I don't know how exactly you generate your report but this is the essence of the changes you will have to make to get your reports working again.
Original Answer
Consider the situation when you had only one table (table A). A couple of rows in the table would look like this:
# Picture 1
# Table A
------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname
1 | US | 45 | John | Fuller | Doe
2 | UK | 32 | Jane | Margaret | Smith
After you add the second table (table B) the name related fields are moved from table A to table B. Table A will have a foreign key pointing to the table B corresponding to each row.
# Picture 2
# Table A
------------------------------------------------------
Id | Country | Age | Name
1 | US | 45 | 10
2 | UK | 32 | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smit
This is the final picture. The catch is that the data will not move from table A to table B on its own. Alas human intervention is required to accomplish this. If I were the said human I would follow the steps given below:
Create table B with columns Id, Firstname, Middlename and Lastname. You now have two tables A and B. A has all the existing data, B is empty .
Add a foreign key to table A. This FK will be called name and will reference the id field of table B.
For each row in table A create a new row in table B using the Firstname, Middlename and Lastname fields taken from table A.
After copying each row, update the name field of table A with the id of the newly created row in table B.
The database now looks like this:
# Table A
-------------------------------------------------------------
Id | Country | Age | Firstname | Middlename | Lastname | Name
1 | US | 45 | John | Fuller | Doe | 10
2 | UK | 32 | Jane | Margaret | Smith | 11
# Table B
------------------------------------------------------
Id | Firstname | Middlename | Lastname
10 | John | Fuller | Doe
11 | Jane | Margaret | Smith
Now you no longer need the Firstname, Middlename and Lastname columns in table A so you can drop them.
voilà, you have performed a data migration!
The process I just described above is but a specific example of a data migration. You can accomplish it in a number of ways using a number of languages/tools. The choice of mechanism will vary from case to case.
Maintenance of the existing reports will depend on the tools used to write / generate those reports. In general:
Identify the existing reports that used table A. (Possibly by searching for files that have the name of table A inside them - however, if table A has a name [eg. Username] which is commonly used elsewhere in the system, this could return a lot of false positives.)
Identify which of those reports used the columns that have been removed from table A.
Amend the existing reports to return the moved columns from table B instead of table A.
A quick way to achieve this is to create a database view that mimics the old structure of table A, and amend the affected reports to use the database view instead of table A. However, this adds an extra layer of complexity into maintaining the reports (since developers may need to maintain the database view as well as the reports) and may be deprecated or even blocked by the DBAs - consequently, I would only recommend using this approach if a lot of existing reports are affected.

Resources