What should I use as a primary key for this table? - database

I'm building a small forum component for a website where sub-forums have different admins and mods responsible for them, and users can be banned from individual sub-forums. The ban table I have looks like this:
_Bantable_
user_id
group_id
start_date
end_date
banned_by
comment
At first I was going to use the first four columns as the primary key, but now I'm wondering if it would matter if I use one at all, since no-one would be banned at the same exact time from the same forum, and regardless I'd still have to check if they were already banned and during what interval. Should I just not use a key here, and simply create an index on the user_id, and group_id and search through those when needed?

It wasn't 100% clear, but it sounds like you want temporary ban functionality on a per user basis for a particular groupId. If this is the case, you should make a composite primary key:
user_id,
group_id,
end_date
This will let you do
SELECT * FROM bantable WHERE user_id=$currentUserToCheck AND group_id=$currentGroupToCheck AND end_date < $currentDate
or something like that
Note: if you want your primary key to be coherent in terms of whatever database design principle you're adhering to, then you can just make the primary key the user_id (because it is indeed a unique identifier), and then make a composite index on the three columns that i specified above.
Be absolutely sure that any queries you run against this table that require individual indexes have those indexes correctly generated.

Do you need the historical record of past bans?
If not, just create a composite PK on {user_id, group_id}. Whatever data is currently in the _Bantable_ determines who is currently banned. When the ban expires, just delete the corresponding row (and consider whether you need the end_date at all1).
If you do need the historic record, put an active ban into your original table as before, but when the ban expires, don't just delete it - instead move it into a separate "history" table, which would have a surrogate PK2 independent from {user_id, group_id} (so a same user/group pair can be in multiple rows) and a trigger that prevents time overlaps (something like this).
1 If this is the date at which the ban is going to end, then you do need it. If this is the date the ban has ended, then you don't - the row will be gone by then.
2 Or alternatively, a PK on {user_id, group_id, start_date}.

Why dont you just take the user_id as primary key? I mean you don't even have to use auto_increment (which obviously would not make any sense in here).
Guessing that you'd request the user_id anyway on login this would probably provide the best performance to look if there is even an entry for banning matters.

Related

Either of 2 columns is always redundant -- is there a better solution?

Say, I want to create a form for a feedback. If a registered user submits a feedback, his email address is used automatically because he's authenticated. If an anonymous user does that, he has to enter his email manually. My table would look like this:
feedbacks(id, user_id, email, body)
As you can see, it has a redundant column: either user_id or email. And for those who's not familiar with the database structure it'll be confusing: why both email and user_id? can they both be null? or both have a value at the same time? in reality, only one of them must have a value, which isn't possibly to achieve on database level using constraints. Also, what if I by mistake insert values in both columns?
Thus, I wonder, is there any way to change its structure so that it's more wise and that issue described above has become resolved? Using a trigger isn't what I'm looking for.
In other words, the issue is "either of 2 columns is always redundant".
If you had several mutually exclusive columns, then you might have a good case for something called entity sub-typing. As it is, there is no good design reason for adding all of the extra overhead of this design pattern.
These are the basic options that you have:
Two mutually exclusive columns in one table - This is your current design. This is a good design because it lets you define a proper foreign key constraint on your user_id. You mention that it may be confusing for people that don't know the database well because the same kind of information might appear in one or the other place in the table. However, it's important to remember that even though both columns contain a string that happens to be in the form of an email address, to your system these things are semantically distinct. One is a foreign key to your user table. The other is a means of contacting (or identifying?) a non-member. You could avoid this apparent confusion in one of two ways: (a) give a more descriptive name to your email column, such as non_member_email or (b) create a view that coalesces user_id and email into a single column for displaying this information to people who would otherwise be confused.
Entity Subtyping - This approach has you create separate tables for logically separate groups of predicates (columns). These are joined together by a supertype table which gives a common primary key for all logical subtypes, as well as holding all other common predicates. You can google around to learn more about this design pattern. As I've already mentioned, this is overkill for your case because you only have one pair of mutually exclusive columns. If you think it's confusing to have this then having three tables (supertype, member subtype, non-member subtype) will really be confusing.
Column Overloading - This approach would have you combine both columns into a single one. This is feasible because you only need room in your table for one email address at a time. This is a terrible idea because it prevents you from creating a declarative referential constraint on user_id which is a very important tool for maintaining your data's referential integrity. It also conflates two semantically different pieces of information, which violates good database design principles.
The best choice is number 1. Don't worry about having two mutually exclusive columns or if you think you can't "comment" your way around the confusion you think this might cause with more descriptive column names, then use a view to hide the "complexity" of storing two things that look similar in two separate columns.
If one must be exclusively filled:
create table feedbacks (
id integer,
user_id text,
email text,
body text,
check ((user_id is null)::int + (email is null)::int = 1)
);
The cast from boolean to integer yields either 1 or 0, so the sum must be 1.
Remove the email field. If the user is registered, enter their user_id as you do now. If the user is not registered, search the user table for an anonymous entry with that email address. If exists, use that user_id. Otherwise, create an entry in the user table named 'Anonymous', storing the address and use the newly created user_id. There are two advantages:
You don't need mutually exclusive fields. As you have already noticed, these can be the cause of a lot of confusion and extra work to keep the data clean.
If an anonymous poster later registers, the existing "anonymous" user entry can be updated, thus preserving the user_id value and preserving all feedback (and any other activity you track for anonymous users) entered before registering. That is, if a user anonymously enters a few feedbacks then registers, the previous feedbacks remain associated with the now named user.
I might misunderstand the question, but why you say it is impossible to do with constraints?..
t=# CREATE TABLE feedbacks (
t(# id integer,
t(# user_id text CHECK (case when email is null then user_id is distinct from null else user_id is null end),
t(# email text CHECK (case when user_id is null then email is distinct from null else email is null end),
t(# body text
t(# );
CREATE TABLE
t=# insert into feedbacks select 1,null,null,'t';
ERROR: new row for relation "feedbacks" violates check constraint "feedbacks_check1"
DETAIL: Failing row contains (1, null, null, t).
t=# insert into feedbacks select 1,'t','t','t';
ERROR: new row for relation "feedbacks" violates check constraint "feedbacks_check1"
DETAIL: Failing row contains (1, t, t, t).
t=# insert into feedbacks select 1,'t',null,'t';
INSERT 0 1
t=# insert into feedbacks select 1,null,'t','t';
INSERT 0 1
t=# select * from feedbacks ;
id | user_id | email | body
----+---------+-------+------
1 | t | | t
1 | | t | t
(2 rows)

Relational database: indirect reference to a "foreign key"

I have a data schema similar to the following:
USERS:
id
name
email
phone number
...
PHOTOS:
id
width
height
filepath
...
I have an auditing table for any changes to the system
LOGS:
id
acting_user
date
record_type (enum: "users", "photos", "...")
record_id
record_field
new_value
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table? And effectively, the record_type and record_id together are a foreign key to the record in the other table? Is this an anti-pattern? (Note: new_value, and all the thing we would be logging are the same data type, strings).
Is this an anti-pattern?
Yes. Any pattern that makes you enforce referential integrity manually1 is an anti-pattern.
Here is why using FOREIGN KEYs is so important and here is what to do in cases like yours.
Is there a name for this setup where an enum in one of the fields refers to the name of one of the other table?
There is no standard term that I know of, but I heard people calling it "generic" or "polymorphic" FKs.
1 As opposed to FOREIGN KEYs built-into the DBMS.
Actually, I think 'Anti-Pattern' is a pretty good name for this set up, but it can be a realistic way to go - especially in this example.
I'll add a similar example with a new table which records LIKES of users' photos, etc, and show why it's bad. Then I'll explain why it might not ne too bad for your LOGS example.
The LIKES table is:
Id
LikedByUserId
RecordType ("users", "photos", "...")
RecordId
This is pretty much the same as the LOGS table. The problem with this is that you cannot make RecordId a foreign key to the USERS table as well as to the PHOTOS table as well as any other tables. If User 1234 is being liked, you couldn't insert it unless there was a PHOTO with ID 1234 and so on. For this reason, all RDBMS's that I know of will not let a Foreign Key be defined with multiple Primary keys - after all, Primary means 'only one' amongst other things.
So you'ld have to create the LIKES table with no relational integrity. This may not be a bad thinbg sometimes, but in this case I'd think I'd want an important table such as LIKES to have valid entries.
To do LIKES properly, I would create the table as:
Id
LikedByUserId (allow null)
PhotoId (allow null)
OtherThingId (allow null)
...and create the appropriate foreign keys. This will actually make queries that read the data easier to read and maintain and probably more efficient too.
However, for a table like LOGS which probably isn't central to the functionality of my system and I'm only doing some ad-hoc querying from to check what's been happening, then I might not want to put in the extra effort and add the complexity that results in more efficient reading. I'm not sure I would actually skip it, though. It is an anti-pattern but depending on usage it might be OK.
To emphasise the point, I would only do this if the system never queried the table; if the only people who look at the data are admin's running ad-hoc queries against it then it might be OK.
Cheers -

How to delete a row with primary key id?

I have a SQL table user. It has 3 columns id(set as primary and Is Identity as Yes), name and password. When I enter data to the table the id became incremented. But on delete query only name and password will be deleted. I need to delete a particular row including the id.
For example:
id:1 name:abc password:123
id:2 name:wer password:234
id:3 name:lkj password:222
id:4 name:new password:999
I need to delete the third column ie, id:3 name:lkj password:222 . But after deleting this row, the table should be shown as below.
id:1 name:abc password:123
id:2 name:wer password:234
id:3 name:new password:999
From the additional information you have provided it shows you do not understand the IDENTITY data type. As others, including myself, have said numbers are not re-used.
You should also avoid changing primary keys just because a row was deleted.
It would seem you need a row number, don't use the key for this. Create a view using the ROW_NUMBER function, something like
SELECT ROW_NUMBER() OVER (Order by id) AS row_number, name, password, ...
FROM [Your_Table]
As #Tony said, once a number has been used, it isn't available anymore. A workaround for this problem is the following:
1. Don't use an Identity field at all. Use just an integer field set as primary key.
2. Declare a trigger which is triggered whenever a new row is inserted.
3. This trigger has to read the the ID of the last inserted row in the table and increment it by one and insert the result in the ID field.
So when you delete this row later, the ID is available again.
If you want to reuse the id later, that is an extremely poor idea. Don;t go down that path. The only ways to do that are either performance problems or are very subject to error when you have race conditions. There is a reason why udntities don't reuse values after all. The id should be meaningless anyway. There is usually no reason why it can't skip values except personal preference. But personal preference should not take precendence over performance and reliability. If you want to dothis because you hate the skipped values then don't. If you are getting this requirement from above, then push back. Tell them that the alternative are more time-consuming and less reliable and far more likely to cause data integrity problems.

Database design, shound I use varchar for Primary Key in this case?

Im building a webpage where users will be able to create accounts, and every account will have its own subdomain. So there could be URL-s like this:
www.user1.domain.com
www.user2.domain.com
...
They will have their own pages too, like this:
www.user1.domain.com/url-1/
www.user1.domain.com/url-2/
www.user2.domain.com/url-3/
...
So I need to store account_url and page_url in database.
I did it like this, I have users, accounts and pages tables.
This is how my tables look like:
USERS:
user_id PK
user_name
user_pass
...
ACCOUNTS:
account_id PK
user_id FK
account_url
account_name
account_type
...
PAGES:
page_id PK
user_id FK
page_url
page_name
page_content
...
Now the problem is this, since I get url like this:
www.user1.domain.com/page-url/
The only information I can fetch from url is account_url and page_url since its in URL, dispatcher/router gets these two variables. account_url is subdomain, and page_url is segment after domain.
Since there will be multiple users I always need to get that user_id so I can update/delete rows that belong to them. So I need to update page_content where user_id belongs to this user and page_url is the one from URL.
But I dont have user_id. And when I would like to update page_url_content, first I need to find user_id, like this:
SELECT user_id FROM accounts WHERE account_url = something
And then when I have user_id I can update content of a page or do any other action.
So is this a good design?
Its normalized and clean, but when Im using this in every action inside controller I need to fetch user_id first joust to be able to do a real query I wanted.
Now, I could use account_url for Primary Key, and have all tables relate to that primary key. So when I get URL I already know the Primary key since its in the URL.
Is this a good case to use Primary Key in URL, or Im doing something wrong?
I prefer to always have my primary ID keys as integers for joins. That said, there are a bunch of ways to help make your site snappy.
You could index the account_url column so look ups are more efficient.
Or you could cookie the users ID and use that value instead of querying the database each time. Granted, you would want to do some session tracking so someone can't spoof someone else.
One presumes the user will be in control of the name of the subdomain, so embedding the user ID into the subdomain name probably wouldn't be effective otherwise it is also an option.
You could keep user ID and user account_url in a separate table and cache that table so you don't hit the database for the vast majority of lookups.
My recommendation would be to keep the primary key the integer, index the account_url and identify a page load target time; say completing all database access and page rendering in under 1.500 seconds. When your site starts to respond over your threshold, then you can analyze your site to see where the actual problems lie and address them then.
In general, leave the database normalized as much as possible. If and when you can provably show (using metrics and actual measurements) that you need to denormalize for performance reasons, then think about doing that.
In this case, if you have a m-1 relationship between a domain and a user's account, you can effectively treat the domain as a user ID; you just have to join things in the right way. (and by m-1, I mean a single domain can only be "owned" by 1 user).
The key thing is that you don't need to get the user_id because you can get to it by joining the ACCOUNTS table as needed since it ties the domain to the user_id.
Lastly, to your question about using the domain as the primary key, you can do this, since a domain is required to be "unique", but you have a minimal overhead and much more flexibility by using a surrogate primary key.
You have two totaly separate issues. Mapping Subdomains and pages to a user is the easier of the two. The more difficult issue is "State". You need to create state database (or similar module) to keep track of which user is currently logged in and if they are still logged in when an update is received.
JZ touched on this in his comment. Don't confuse these two issues, they are separate and should betreated as such.

How to store old version of ID String in Database Design?

I am building a small database for a lab. We have some rules to make a ID String for every Item, so i do not want to store it in my database. The problem is that some times changes in data, for example the person response for that item changed, causes the chang of ID String. But i can not correct it for printed docs. How can i store the old version of that ID String.
I may simply do't change it but that will break the rules. Any suggestions?
To expand on Damir's point
A "Smart Key" is what you say when
We have some rules to make a ID String for every Item
You're taking the name of the item, maybe a category code and adding
person response for that item
So if I were responsible for Beakers that item ID might be
GLASSWARE-BEAKER-SPAGE
That 'code' becomes a 'Smart key' when you use it in your database as a Primary Key.
This is an anti-pattern. Like most anti-patterns it's seductive. People like the idea of just looking at the key and knowing what kind of thing it is, what it is called and who do I ask to get more. All that information on a report or shelf-label with just a few characters. But it's an anti-pattern for the reason you mentioned - it has meaning and meaning can be changed.
As Damir suggests, you can store this value in another column that we'd call an ALTERNATE KEY or CANDIDATE KEY... it's unique, it could be a PK but it's not. You'll want a unique constraint on the column but not a Primary Key constraint.
It is important to distinguish between a primary key which is supposed to uniquely identify a row in a table and some kind of a smart key that products in catalogs usually have.
For a primary key use auto-incrementing integer -- very few exceptions to this one.
Add columns for things that you are trying to represent in that smart key, like: Person, Project, Response etc.
Add a separate column for that key and treat it like any other field in the table -- this should keep people who are used to this kind of thinking happy.
Smart key is a misnomer here, from a db-design point, that key is rather dumb.
for example the person response for that item changed, causes the chang of ID String
Looks like the workflow in your lab is broken. IDs should never change. Try to bring this to attention of your superiors.

Resources