What is an efficient way if implementing hellbanning?

What is an efficient way if implementing hellbanning? - database

I was working on a forum component for a bigger project and considered adding in a hellban feature where a mod may prevent a user's posts from being viewed by anyone but that user. This basically enforces the "don't feed the troll" rule, forcing everyone to ignore the troublemaker. Meanwhile the troublemaker likely becomes bored as he doesn't succeed in getting a rise out of anyone and hopefully moves on.
My first thought was to add in a "hellbanned" column in a post table, and create a "hellbanend" table. A hellbanned user would have their user_id added as a record to the hellbanned table, and henceforth all their future posts would have their hellbanned column set to true.
So a query showing all a topic's posts would simply show all posts where 'hellbanned = False'. And, a post operation would check if the user was in the hellban table, and if so, set the post's 'hellbanned' column to True.
I can't help but thinking there is a better way to do this; I'd really appreciate some suggestions.

Hellbanning exists at the level of the user, not individual posts, so you don't have to keep the flag at the level of the posts table at all - in fact, doing that would open you to data inconsistencies (e.g. an application bug may lead to an "incompletely" hellbanned user).
Instead, put hellbanned user ID to a separate table (and if your DBMS supports it: cluster it to avoid "unnecessary" table heap)...
CREATE TABLE HELLBANNED_USER (
USER_ID INT PRIMARY KEY,
FOREIGN KEY (USER_ID) REFERENCES USER (USER_ID)
)
...and when the time comes to exclude the hellbanned user's posts, do it similarly to this:
SELECT * FROM POST
WHERE USER_ID NOT IN (
SELECT USER_ID FROM HELLBANNED_USER
)
This should perform nicely due to the index on HELLBANNED_USER.USER_ID.
The hellbanned users are still in the regular USER table, so everything else can keep working for them without significant changes to your code.
Obviously, once the user is hellbanned above, all of its posts (even those that were made before hellbanning) would become invisible. If you don't want that, add a HELLBANNED_DATE field to the hellbanned table and then hide the posts after the hellbanning similarly to...
SELECT * FROM POST
WHERE NOT EXISTS (
SELECT * FROM HELLBANNED_USER
WHERE POST.USER_ID = HELLBANNED_USER.USER_ID
AND POST_DATE >= HELLBANNED_DATE
)
Alternatively, you could just keep the HELLBANNED flag (and/or HELLBANNED_DATE) in the USER table, but you'd need to be careful to index it properly for good performance.
This might actually be a better solution than the HELLBANNED_USER, if you need to JOIN with the USER anyway (to display additional user information for each post), so the flag is readily reachable without doing the additional search through the HELLBANNED_USER table.

Related

Many to many relations and history tables

Suppose I have Item and Tag, each of which have an id and name column only, and an Item_Tag_Map table that has a composite Item.id, Tag.id primary key.
If I want to implement a history table for Item and Tag, this seems relatively straightforward - I can add a third column revision and a trigger to copy into an ItemHistory or TagHistory table with id, revision as primary key and operation ("INSERT","UPDATE",etc). Since I may want to "delete" items, I can go about this one of two ways:
Add another column on Item or Tag for is_active, and do not actually delete any rows ever
Delete rows, but record the deletion in the history table as a delete operation, and on an Item or Tag insert, make sure to get the latest revision number from the ItemHistory or TagHistory table with that item, and set it to be that
The second option leaves a bad taste in my mouth, so I am fine with using the first. After all, why should I really ever need to delete an item when I can just modify it or change its active status?
Now, I've run into the same problem for the history table on the Item_Tag_Map table, but this time, neither option seems all that attractive. If I choose to add an is_active for the Item_Tag_Map, the logic of finding out whether a tag is mapped to an item changes from:
Get ALL tag_mapping for THESE items
to
Get ALL tag_mapping for THESE items WHERE is_active
The implicit idea that the presence of a mapping means that the mapping exists goes away. The set of unmapped item-tags not only includes all the ones that are not present in the table, but also the ones where is_active is false.
On the other hand, if I choose the second option, it's still rather ugly.
I'm sure people have run into this problem many times before, and I am interested in learning how you have dealt with it.

My answer depends on a few things, so I'll try to state my assumptions.
No matter what I think is_active on Item and Tag are ok. If the record size grows very fast on those two entities, then consider running a nightly job to move the inactive records to an archived version of the tables. This can be used for reporting or auditing of things later. You can also write a script to restore records if you need, but the idea is that your real time tables are fast and without deleted data.
If you allow the user to add/update/delete mappings, then I would consider the table the same as Item and Tag. Add the flag and use it in your queries. It doesn't seem ugly to me - I've seen it before.
If the mapping table isn't under user control, then I would guess you would use the is_active flag on either Item or Tag to determine whether or not a query could be run.
Just know that once you add that flag, people will forget to use it. I know I've done it many times, ("Why did I get so many records, what am I missing? Oh yeah, is_active...)

What should I use as a primary key for this table?

I'm building a small forum component for a website where sub-forums have different admins and mods responsible for them, and users can be banned from individual sub-forums. The ban table I have looks like this:
_Bantable_
user_id
group_id
start_date
end_date
banned_by
comment
At first I was going to use the first four columns as the primary key, but now I'm wondering if it would matter if I use one at all, since no-one would be banned at the same exact time from the same forum, and regardless I'd still have to check if they were already banned and during what interval. Should I just not use a key here, and simply create an index on the user_id, and group_id and search through those when needed?

It wasn't 100% clear, but it sounds like you want temporary ban functionality on a per user basis for a particular groupId. If this is the case, you should make a composite primary key:
user_id,
group_id,
end_date
This will let you do
SELECT * FROM bantable WHERE user_id=$currentUserToCheck AND group_id=$currentGroupToCheck AND end_date < $currentDate
or something like that
Note: if you want your primary key to be coherent in terms of whatever database design principle you're adhering to, then you can just make the primary key the user_id (because it is indeed a unique identifier), and then make a composite index on the three columns that i specified above.
Be absolutely sure that any queries you run against this table that require individual indexes have those indexes correctly generated.

Do you need the historical record of past bans?
If not, just create a composite PK on {user_id, group_id}. Whatever data is currently in the _Bantable_ determines who is currently banned. When the ban expires, just delete the corresponding row (and consider whether you need the end_date at all1).
If you do need the historic record, put an active ban into your original table as before, but when the ban expires, don't just delete it - instead move it into a separate "history" table, which would have a surrogate PK2 independent from {user_id, group_id} (so a same user/group pair can be in multiple rows) and a trigger that prevents time overlaps (something like this).
1 If this is the date at which the ban is going to end, then you do need it. If this is the date the ban has ended, then you don't - the row will be gone by then.
2 Or alternatively, a PK on {user_id, group_id, start_date}.

Why dont you just take the user_id as primary key? I mean you don't even have to use auto_increment (which obviously would not make any sense in here).
Guessing that you'd request the user_id anyway on login this would probably provide the best performance to look if there is even an entry for banning matters.

Database design - system default items and custom user items

This question applies to any database table design, where you would have system default items and custom user defaults of the same type (ie user can add his own custom items/settings).
Here is an example of invoicing and paymenttypes, By default an invoice can have payment terms of DueOnReceipt, NET10, NET15, NET30 (this is the default for all users!) therefore you would have two tables "INVOICE" and "PAYMENT_TERM"
INVOICE
Id
...
PaymentTermId
PAYMENT_TERM (System default)
Id
Name
Now what is the best way to allow a user to store their own custom "PaymentTerms" and why? (ie user can use system default payment terms OR user's own custom payment terms that he created/added)
Option 1) Add UserId to PaymentTerm, set userid for the user that has added the custom item and system default userid set to null.
INVOICE
Id
...
PaymentTermId
PaymentTerm
Id
Name
UserId (System Default, UserId=null)
Option 2) Add a flag to Invoice "IsPaymentTermCustom" and Create a custom table "PAYMENT_TERM_CUSTOM"
INVOICE
Id
...
PaymentTermId
PaymentTermCustomId
IsPaymentTermCustom (True for custom, otherwise false for system default)
PaymentTerm
Id
Name
PAYMENT_TERM_CUSTOM
Id
Name
UserId
Now check via SQL query if the user is using a custom payment term or not, if IsPaymentTermCustom=True, it means the user is using custom payment term otherwise its false.
Option 3) ????
...

As a general rule:
Prefer adding columns to adding tables
Prefer adding rows to adding columns
Generally speaking, the considerations are:
Effects of adding a table
Requires the most changes to the app: You're supporting a new kind of "thing"
Requires more complicated SQL: You'll have to join to it somehow
May require changes to other tables to add a foreign key column referencing the new table
Impacts performance because more I/O is needed to join to and read from the new table
Note that I am not saying "never add tables". Just know the costs.
Effects of adding a column
Can be expensive to add a column if the table is large (can take hours for the ALTER TABLE ADD COLUMN to complete and during this time the table wil be locked, effectively bringing your site "down"), but this is a one-time thing
The cost to the project is low: Easy to code/maintain
Usually requires minimal changes to the app - it's a new aspect of a thing, rather than a new thing
Will perform with negligible performance difference. Will not be measurably worse, but may be a lot faster depending on the situation (if having the new column avoids joining or expensive calculations).
Effects of adding rows
Zero: If your data model can handle your new business idea by just adding more rows, that's the best option
(Pedants kindly refrain from making comments such as "there is no such thing as 'zero' impact", or "but there will still be more disk used for more rows" etc - I'm talking about material impact to the DB/project/code)
To answer the question: Option 1 is best (i.e. add a column to the payment option table).
The reasoning is based on the guidelines above and this situation is a good fit for those guidelines.
Further,
I would also store "standard" payment options in the same table, but with a NULL userid; that way you only have to add new payment options when you really have one, rather than for every customer even if they use a standard one.
It also means your invoice table does not need changing, which is a good thing - it means minimal impact to that part of your app.

It seems to me that there are merely "Payment Terms" and "Users". The decision of what are the "Default" payment terms is a business rule, and therefore would be best represented in the business layer of your application.
Assuming that you would like to have a set of pre-defined "default" payment terms present in your application from the start, these would already be present in the payment terms table. However, I would put a reference table in between USERS and PAYMENT TERMS:
USERS:
user-id
user_namde
USER_PAYMENT_TERMS:
userID
payment_term_id
PAYMENT_TERMS:
payment_term_id
payment_term
Your business layer should offer up to the user (or more likely, the administrator) through a GUI the ability to:
Assign 0 to many payment term options to a particular user (some
users may not want one of the defaults to even be available, for
example.
Add custom payment terms, which then become available for assignment to one or more users (but which avoids the creation of duplicate payment terms by different users)
Allows the definition of a custom payment term to be assigned to more than one user (say the user's company a unique payment process which requires all of their users to utilize a payment term other than one of the defaults? Create the custom term once, and assign to all users.
Your application business layer would establish rules governing access to payment terms, which could then be accessed by your user interface.
Your UI would then (again, likely through an administrator function) allow the set up of one or more payment terms in addition to the standards you describe, and then make them available to one or more users through something like a checked list box (for example).

Option 1 is definately better for the following reasons:-
Correctness
You can implement a database constraint for uniqueness of the payment term name
You can implement a foreign key constraint from Invoice to PaymentTerm
Ease of Use
Conducting queries will be much simplier because you will always join from Invoice to PaymentTerm rather than requiring a more complex join. Most of the time when you select you will not care if it is an inbuilt or custom payment term. The optimizer will have an easier time with a normal join instead of one that depends on another column to decide which table to join.
Easier to display a list of PaymentTerms coming from one table
We use Option 1 in our data-model quite alot.

Part of the problem, as I see it, is that different payment terms lead to different calculations, too. If I were still in the welding supply business, I'd want to add "2% 10 NET 30", which would mean 2% discount if the payment is made in full within 10 days, otherwise, net 30."
Setting that issue aside, I think ownership of the payment terms makes sense. Assume that the table of users (not shown) includes the user "system" as, say, user_id 0.
create table payment_terms (
payment_term_id integer primary key,
payment_term_owner_id integer not null references users (user_id),
payment_term_desc varchar(30) not null unique,
);
insert into payment_terms values (1, 0, 'Net 10');
insert into payment_terms values (2, 0, 'Net 15');
...
insert into payment_terms values (5, 1, '2% 10, Net 30');
This keeps foreign keys simple, and it makes it easy to select payment terms at run time for presentation in the user interface.
Be very careful here. You probably want to store the description, not the ID number, with your invoices. (It's unique; you can set a foreign key reference to it.) If you store only the ID number, updating a user's custom description might subtly corrupt all the data that references it.
For example, let's say that the user created a custom payment term number 5, '2% 10, Net 30'. You store the ID number 5 in your table of invoices. Then the user decides that things will be different starting today, and updates that description to '2% 10, Net 20'. Now on all your past invoices, the arithmetic no longer matches the payment terms.
Your auditor will kill you. Twice.
You'll want to prevent ordinary users from deleting rows owned by the system user. There are several ways to do that.
Use a BEFORE DELETE trigger.
Add another table with foreign key references to the rows owned by the system user.
Restrict all access through stored procedures that prevent deleting system rows.
(And flags are almost never the best idea.)

Applying general rules of database design to the problem at hand:
one table for system payment terms
one table for user payment terms
a view of join of the two above
Now you can join invoice on the view of payment terms.
Benefits:
No flag columns
No nulls
You separate system defaults from user data
Things become straight forward for the db

how are viewing permissions usually implemented in a relational database?

What's the standard relational database idiom for setting permissions for items?
Answers should be general; however, they should be able to be applied to example below. Anything flies: adding columns, adding another table—whatever as long as it works well.
Application / Example
Assume the Twitter database is extremely simple: we have one User table, which contains a login and user id; we have a Tweet table, which contains a tweet id, tweet text, and creator id; and we have a Follower table, which contains the id of the person being followed and the follower.
Now, assume Twitter wants to enable advanced privacy settings (viewing permissions), so that users can pick exactly which followers can view tweets. The settings can be:
Everyone on Twitter
Only current followers (which would of course have to be approved by the user, this doesn't really matter though) EDIT: Current as in, I get a new follower, he sees it; I remove a follower, he stops seeing it.
Specific followers (e.g., user id 5, 10, 234, and 1)
Only the owner
Under these circumstances, what's the best way to represent viewing permissions? The priorities, in order, are speed of lookup (you want to be able to figure out what tweets to display to a user quickly), speed of creation (you don't want to take forever to post a tweet), and efficient use of space (every time I post a tweet to everyone on my followers' list, I shouldn't have to add a row for each and every follower I have to some table.)

Looks like a typical many-to-many relationship -- I don't see any restrictions on what you desire that would allow space savings wrt the typical relational DB idiom for those, i.e., a table with two columns (both foreign keys, one into users and one into tweets)... since the current followers can and do change all the time, posting a tweet to all the followers that are current at the instant of posting (I assume that's what you mean?) does mean adding that many (extremely short) rows to that relationship table (the alternative of keeping a timestamped history of follower sets so you can reconstruct who was a follower at any given tweet-posting time appears definitely worse in time and not substantially better in space).
If, on the other hand, you want to check followers at the time of viewing (rather than at the time of posting), then you could make a special userid artificially meaning "all followers of the current user" (just like you'll have one meaning "all users on Twitter"); the needed SQL to make the lookup fast, in that case, looks hairy but feasible (a UNION or OR with "all tweets for which I'm a follower of the author and the tweet is readable by [the artificial userid representing] all followers"). I'm not getting deep into that maze of SQL until and unless you confirm that it is this peculiar meaning that you have in mind (rather than the simple one which seems more natural to me but doesn't allow any space savings on the relationship table for the action of "post tweet to all followers").
Edit: the OP has clarified they mean the approach I mention in the second paragraph.
Then, assume userid is the primary key of the Users table, the Tweets table has a primary key tweetid and a foreign key author for the userid of each tweet's author, the Followers table is a typical many-to-many relationship table with the two columns (both foreign keys into Users) follower and followee, and the Canread table a not-so-typical many-to-many relationship table, still with two column -- foreign key into Users is column reader, foreign key into Tweets is column tweet (phew;-). Two special users #everybody and #allfollowers are defined with the above meanings (so that posting to everybody, all followers, or "just myself", all add only one row to Canread -- only selective posting to a specific list of N people adds N rows).
So the SQL for the set of tweet IDs a user #me can read is, I think, something like:
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
WHERE Canread.reader IN (#me, #everybody)
UNION
SELECT Tweets.tweetid
FROM Tweets
JOIN Canread ON(Tweets.tweetid=Canread.tweet)
JOIN Followers ON(Tweets.author=Followers.followee)
WHERE Canread.reader=#allfollowers
AND Followers.follower=#me

Database design 1 to 1 relationship

I design my database incorrectly, should I fix this while its in development?
"user" table is suppose to have a 1.1 relationship with "userprofile" table
however the actual design the "user" table has a 1.* relationship with "userprofile" table.
Everything works! but should it be fixed anyways?

Do one thing
User Table
Userid(p)
UserName
othercol..
UserProfile
id(p)
UserId(f) - and unique
othercol..
hope this way you can easily fix the isse

Make the user_id in the user_profile table unique and its fixed.

If it's a 1:1 relationship and you often are bringing back records from "user" table and "userprofile" together then you might consider just merging them into one table.

Yes, fix this with a unique index on the FK field. The reason why you need to fix it now is that you can't control how badly people are going to insert data over time when the database is not set up correctly with controls that do not allow the behavior you do not want.
The first time you havea a duplicated record inserted into the child table, you might break a lot of code. With no unique index, the chances of a second record getting inserted can be quite high. You can say, you'll control this at the application level but that is usaully a poor choice as there is no guaranteee that other applications, bulk inserts etc aren't gong to happen that circumvent the application. Putting things right as soon as you can in a database design is critical. It becomes really hard to fix a poor design when there are a lot of records in the database.

#pranay
User Table
Userid(p)
UserName
othercol..
UserProfile
id(p)
UserId(f) - and unique
othercol..
Is that normally how you do it(above)? or do you do this(below)?
User Table
Userid(p)
UserName
othercol..
UserProfile
id(p) <--- userid
othercol..

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight