Database design - Friend activities - database

Currently I am designing a small twitter/facebook kind of system, where in a user should be able to see his friends latest activities.
I am using ASP.NET with MySQL database.
My Friendships table is as follows:
|Friendshipid|friend1|Friend2|confirmed|
Friend1 and Friend2 in the above table are userids.
User activities table design following:
|activityId|userid|activity|Dated|
Now, I am looking for best way to query the latest 50 friend activities for a user.
For example, let's say if Tom logs into the system, he should be able to see latest 50 activities among all his friends.
Any pointers on the best practices, a query or any information is appreciated.

It largely depends on what data is stored in the Friendships table. For example, what order are the Friend1 and Friend2 fields stored in? If, for the fields (friend1, friend2) the tuple (1, 2) exists, will (2, 1) exist also?
If this is not the case, then this should work:
SELECT activities.*
FROM Activities
INNER JOIN Friendships ON userid = friend1 OR userid = friend2
WHERE activity.userid != [my own id]
AND confirmed = TRUE
LIMIT 50;

If you have database performance concern, you may redefine the friendship table as following:
friendshipid, userid, friendid, confirmed
When you query the latest 50 activities, the SQL would be:
SELECT act.*
FROM Activities AS act
INNER JOIN
Friendships AS fs
ON fs.friendid = act.userid
AND fs.user_id = 'logon_user_id'
AND confirmed = TRUE
ORDER BY act.dated DESC
LIMIT 50;
And if there is a index on Friendships(userid) column, it would give the database the chance to optimize the query.
The friendship table redefined needs to create two tuples when a friendship occur, but it still obey the rule of business, and, has performance benefit when you need it.

Related

SaaS- Tenant Specific Lookup Data in Shared Database

I am developing multitenant SaaS based application and going with Shared Database for storing all the tenant records with help of TenantId column.
Now the problem is i have some list of lookup records that needs to be shared for all the tenants. For example list of games.
GamesTable
Id
GameName
Also have another table used for storing only tenant specific records
TenantGames
Id
TenantId
GameName
The basic need is i want to use both table data and get the necessary details (Game_Name) while joining with another transaction table like UserGames. How can i achive this with this design? Here Game_Name can be either referred from Games Shared table or TenantSpecificGames table
Is there any other DB design which allows me to do mix both common master data and tenant master data with JOIN?
Basic requirement is keep common data and allow customization for the tenants if they want to add any new items.
This is the design I would then use.
Games
Id
GameName
IsTenantSpecific
SomeGameSpecificColumn
TenantGames
GameId
TenantId
SomeTenantSpecificColumn
AnotherTenantSpecificColumn
Then you can query that table in a Join with:
...
FROM
Games
INNER JOIN UserGames ON
UserGames.GameId = Games.Id
LEFT JOIN TenantGames ON
TenantGames.GameId = Games.Id
WHERE
TenantGames.TenantId = #tenantId OR
(
TenantGames.TenantId IS NULL AND
IsTenantSpecific = 0
)
Game specific fields can be put in the Games table. Tenant specific fields can be added to the TenantGames table, and those fields will be NULL if it is not a tenant specific customization.
We have a saas based database and we keep common data and tenant data in the same table.
Concept
GamesTable
Id NOT NULL
TenantId NULL
GameName NOT NULL
Add a unique key for TenantId and GameName
if TenantId is NULL you know it is common data
if TenantId is NOT NULL you know it belongs to a specific tenant and who exactly.
"Is there any other DB design which allows me to do mix both common
master data and tenant master data with JOIN?"
Yes
SELECT *
FROM GamesTable where TenantId = 'your tenant id'
UNION
SELECT *
FROM GamesTable where TenantId IS NULL -- common
This is a classic example of "many to many".
Table: Games
------------
GameID
GameName
IsMasterGame
TennantGames
------------------
GameID
TennantID
Tennants
------------
TennantID
...
To get the games for a given tennant, you would run a query like:
select *
from Games
where isMasterGame = true
union
select *
from Games g,
TennantGames tg
where g.GameID = tg.GameID
and isMasterGame = false
and tg.TennantID = $currentTennant
(Apologies for archaic join syntax)
The union allows you to ask two questions: which games apply to everyone (isMasterGame = true), and secondly which games apply to the current tennant (tg.TennantID = $currentTennant). Logically, tennant games cannot also be master games.
You can merge the tables leaving TenantId as NULL for records you wish to not be Tenant specific.
Games
Id
TenantId
GameName
The you can query that table in a Join with:
...
FROM
Games
INNER JOIN UserGames ON
UserGames.GameId = Games.Id
WHERE
Games.TenantId = #tenantId OR
Games.TenantId IS NULL
This will save you the trouble of ensuring that the Id is unique between the tables, unless you are using a UNIQUEIDENTIFIER for the Id.

Database tables: One-to-many of different types

Due to non-disclosure at my work, I have created an analogy of the situation. Please try to focus on the problem and not "Why don't you rename this table, m,erge those tables etc". Because the actual problem is much more complex.
Heres the deal,
Lets say I have a "Employee Pay Rise" record that has to be approved.
There is a table with single "Users".
There are tables that group Users together, forexample, "Managers", "Executives", "Payroll", "Finance". These groupings are different types with different properties.
When creating a "PayRise" record, the user who is creating the record also selects both a number of these groups (managers, executives etc) and/or single users who can 'approve' the pay rise.
What is the best way to relate a single "EmployeePayRise" record to 0 or more user records, and 0 or more of each of the groupings.
I would assume that the users are linked to the groups? If so in this case I would just link the employeePayRise record to one user that it applies to and the user that can approve. So basically you'd have two columns representing this. The EmployeePayRise.employeeId and EmployeePayRise.approvalById columns. If you need to get to groups, you'd join the EmployeePayRise.employeeId = Employee.id records. Keep it simple without over-complicating your design.
My first thought was to create a table that relates individual approvers to pay rise rows.
create table pay_rise_approvers (
pay_rise_id integer not null references some_other_pay_rise_table (pay_rise_id),
pay_rise_approver_id integer not null references users (user_id),
primary key (pay_rise_id, pay_rise_approver_id)
);
You can't have good foreign keys that reference managers sometimes, and reference payroll some other times. Users seems the logical target for the foreign key.
If the person creating the pay rise rows (not shown) chooses managers, then the user interface is responsible for inserting one row per manager into this table. That part's easy.
A person that appears in more than one group might be a problem. I can imagine a vice-president appearing in both "Executive" and "Finance" groups. I don't think that's particularly hard to handle, but it does require some forethought. Suppose the person who entered the data changed her mind, and decided to remove all the executives from the table. Should an executive who's also in finance be removed?
Another problem is that there's a pretty good chance that not every user should be allowed to approve a pay rise. I'd give some thought to that before implementing any solution.
I know it looks ugly but I think somethimes the solution can be to have the table_name in the table and a union query
create table approve_pay_rise (
rise_proposal varchar2(10) -- foreign key to payrise table
, approver varchar2(10) -- key of record in table named in other_table
, other_table varchar2(15) );
insert into approve_pay_rise values ('prop000001', 'e0009999', 'USERS');
insert into approve_pay_rise values ('prop000001', 'm0002200', 'MANAGERS');
Then either in code a case statement, repeated statements for each other_table value (select ... where other_table = '' .. select ... where other_table = '') or a union select.
I have to admit I shudder when I encounter it and I'll now go wash my hands after typing a recomendation to do it, but it works.
Sounds like you'd might need two tables ("ApprovalUsers" and "ApprovalGroups"). The SELECT statement(s) would be a UNION of UserIds from the "ApprovalUsers" and the UserIDs from any other groups of users that are the "ApprovalGroups" related to the PayRiseId.
SELECT UserID
INTO #TempApprovers
FROM ApprovalUsers
WHERE PayRiseId = 12345
IF EXISTS (SELECT GroupName FROM ApprovalGroups WHERE GroupName = "Executives" and PayRiseId = 12345)
BEGIN
SELECT UserID
INTO #TempApprovers
FROM Executives
END
....
EDIT: this would/could duplicate UserIds, so you would probably want to GROUP BY UserID (i.e. SELECT UserID FROM #TempApprovers GROUP BY UserID)

Avoid SQL Cursor in this scenario

I have inherited a system which seemingly requires me to use a cursor or while loop.
Given the below tables, I would like to get the names of the attendees e.g
BillBobJaneJill
Attendees
SourceTable|SourceTableIdBoys |1Boys |2Girls |2Girls |1
Boys
Id|FirstName1 |Bill2 |Bob
Girls
Id|FirstName1 |Jill2 |Jane
Note, the system doesn't actually use Attendees,Boys & Girls but rather uses Contracts, Orders and other such entities etc but it was easier\simpler to represent in this form.
There may be loads more lookup tables than just "boy" and "girl" so
Is there anyway I can achieve this by not using cursors or other row based operations.
If I understand this query should work:
SELECT FirstName
FROM Attendees
join Boys on id = SourceTableId
WHERE SourceTable = 'Boys'
union all
SELECT FirstName
FROM Attendees
join Girls on id = SourceTableId
WHERE SourceTable = 'Girls'
A union is probably the only way you're going to do this, probably encapsulated in a view. If you can get a list of the tables then you could write a code generator that generates the view. If necessary put the view in a different database or schema on the same server if the vendor won't allow you to put it in the application DB.
Can you programatically identify the tables and columns you need or get a list from somewhere?

Updating redundant/denormalized data automatically in SQL Server

Use a high level of redundant, denormalized data in my DB designs to improve performance. I'll often store data that would normally need to be joined or calculated. For example, if I have a User table and a Task table, I would store the Username and UserDisplayName redundantly in every Task record. Another example of this is storing aggregates, such as storing the TaskCount in the User table.
User
UserID
Username
UserDisplayName
TaskCount
Task
TaskID
TaskName
UserID
UserName
UserDisplayName
This is great for performance since the app has many more reads than insert, update or delete operations, and since some values like Username change rarely. However, the big draw back is that the integrity has to be enforced via application code or triggers. This can be very cumbersome with updates.
My question is can this be done automatically in SQL Server 2005/2010... maybe via a persisted/permanent View. Would anyone recommend another possibly solution or technology. I've heard document-based DBs such as CouchDB and MongoDB can handle denormalized data more effectively.
You might want to first try an Indexed View before moving to a NoSQL solution:
http://msdn.microsoft.com/en-us/library/ms187864.aspx
and:
http://msdn.microsoft.com/en-us/library/ms191432.aspx
Using an Indexed View would allow you to keep your base data in properly normalized tables and maintain data-integrity while giving you the denormalized "view" of that data. I would not recommend this for highly transactional tables, but you said it was heavier on reads than writes so you might want to see if this works for you.
Based on your two example tables, one option is:
1) Add a column to the User table defined as:
TaskCount INT NOT NULL DEFAULT (0)
2) Add a Trigger on the Task table defined as:
CREATE TRIGGER UpdateUserTaskCount
ON dbo.Task
AFTER INSERT, DELETE
AS
;WITH added AS
(
SELECT ins.UserID, COUNT(*) AS [NumTasks]
FROM INSERTED ins
GROUP BY ins.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount + added.NumTasks)
FROM dbo.[User] usr
INNER JOIN added
ON added.UserID = usr.UserID
;WITH removed AS
(
SELECT del.UserID, COUNT(*) AS [NumTasks]
FROM DELETED del
GROUP BY del.UserID
)
UPDATE usr
SET usr.TaskCount = (usr.TaskCount - removed.NumTasks)
FROM dbo.[User] usr
INNER JOIN removed
ON removed.UserID = usr.UserID
GO
3) Then do a View that has:
SELECT u.UserID,
u.Username,
u.UserDisplayName,
u.TaskCount,
t.TaskID,
t.TaskName
FROM User u
INNER JOIN Task t
ON t.UserID = u.UserID
And then follow the recommendations from the links above (WITH SCHEMABINDING, Unique Clustered Index, etc.) to make it "persisted". While it is inefficient to do an aggregation in a subquery in the SELECT as shown above, this specific case is intended to be denormalized in a situation that has higher reads than writes. So doing the Indexed View will keep the entire structure, including the aggregation, physically stored so each read will not recalculate it.
Now, if a LEFT JOIN is needed if some Users do not have any Tasks, then the Indexed View will not work due to the 5000 restrictions on creating them. In that case, you can create a real table (UserTask) that is your denormalized structure and have it populated via either a Trigger on just the User Table (assuming you do the Trigger I show above which updates the User Table based on changes in the Task table) or you can skip the TaskCount field in the User Table and just have Triggers on both tables to populate the UserTask table. In the end, this is basically what an Indexed View does just without you having to write the synchronization Trigger(s).

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources