SQL Server many-to-many design recommendation - database

I have a SQL Server database with two table : Users and Achievements. My users can have multiple achievements so it a many-to-many relation.
At school we learned to create an associative table for that sort of relation. That mean creating a table with a UserID and an AchivementID. But if I have 500 users and 50 achievements that could lead to 25 000 row.
As an alternative, I could add a binary field to my Users table. For example, if that field contained 10010 that would mean that this user unlocked the first and the fourth achievements.
Is their other way ? And which one should I use.

Your alternative way isn't a very good approach at all. Not only is it not queryable (how many people unlocked achievement #10?), but it means nothing. Plus, what are you going to do if you add 5 more achievements? Update all the previous users to add "00000" to the end of their "achievements" column?
There is nothing wrong with the associative table as long as you index it properly. Using that approach the data is infinitly queryable and - perhaps more importantly - makes sense!

Related

How can I store an indefinite amount of stuff in a field of my database table?

Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.

Many-to-many relations - lots of them

I have a general question about stroring relations in a database.
Let's take a social example: followers. I want to store that Friend A (id 1) is following Friend B (id 5).
I do this with a many-to-many relation in the database.
inserting:
table flw (friend,following,relation_id): (1,5,1)
And so we continue.. with a million relations from friends A,B,C,D,...
Now my question is, when I need to fetch all relations that Friend A (or B,C,...) is following.. it needs to get trough million of records. Is this still effective ? I can't image facebook goes trough a billion of rows to get my friends list.
The solution is probably to use more than one database. But is this (millions of many-to-many-relations) the (only) correct way to store what's in my example ?

Basic query about SQL Server

Why use rowguid and what are the benefits?
Suppose a company has thousands of customers, is it a good approach to divide them on the basis of their gender for the performance and fast query if no then why?
How do large companies like Facebook handle their primary key for their comments, users and for other things for example:
Suppose there are five users with primary key 1,2,3,4,5...
What if user 3 is deleted from there, now it's 1,2,4,5 will be left, which is kind of gap between continuous chain. How do they deal with it?
Don't know - maybe you use a non-auto value so you can keep it constant across other databases (maybe for use with 3rd part integration etc.)
Do not divide on a field such as gender, when you don't know gender (or want a full list) you are going to have to search two tables, also when you want to add other filtering/searching you will have to do over multiple tables again
So what if there is a gap in the ID chain - it does not effect anything. Why would you think it is important?

Database design - do I need one of two database fields for this?

I am putting together a schema for a database. The goal of the database is to track applications in our department. I have a repeated problem that I am trying to solve.
For example, I have an "Applications" table. I want to keep track if any application uses a database or a bug tracking system so right now I have fields in the Applications table called
Table: Applications
UsesDatabase (bit)
Database_ID (int)
UsesBugTracking (bit)
BugTracking_ID (int)
Table: Databases:
id
name
Table: BugTracking:
id
name
Should I consolidate the "uses" column with the respective ID columns so there is only one bug tracking column and only one database column in the applications table?
Any best practice here for database design?
NOTE: I would like to run reports like "Percent of Application that use bug tracking" (although I guess either approach could generate this data.)
You could remove the "uses" fields and make the id columns nullable, and let a null value mean that it doesn't use the feature. This is a common way of representing a missing value.
Edit:
To answer your note, you can easily get that statistics like this:
select
count(*) as TotalApplications,
count(Database_ID) as UsesDatabase,
count(BugTracking_ID) as UsesBugTracking
from
Applications
Why not get rid of the two Use fields and simply let a NULL value in the _ID fields indicate that the record does not use that application (bug tracking or database)
Either solution works. However, if you think you may want to occasionally just get a list of applications which do / do not have databases / bugtracking consider that having the flag fields reduces the query by one (or two) joins.
Having the bit fields is slightly denormalized, as you have to keep two fields in sync to keep one piece of data updated, but I tend to prefer them for cases like this for the reason I gave in the prior paragraph.
Another option would be to have the field nullable, and put null in it for those entries which do not have DBs / etc, but then you run into problems with foreign key constraints.
I don't think there is any one supreme right way, just consider the tradeoffs and go with what makes sense for your application.
I would use 3 tables for the objects: Application, Database, and BugTracking. Then I would use 2 join tables to do 1-to-many joins: ApplicationDatabases, and ApplicationBugTracking.
The 2 join tables would have both an application_id and the id of the other table. If an application used a single database, it would have a single ApplicationDatabases record joining them together. Using this setup, an application could have 0 database (no records for this app in the ApplicationDatabases table), or many databases (multiple records for this app in the ApplicationDatabases table).
"Should i consolidate the "uses" column"
If I look at your problem statement, then there either is no "uses" column at all, or there are two. In either case, it is wrong of you to speak of "THE" uses column.
May I politely suggest that you learn to be PRECISE when asking questions ?
Yes using null in the foreign key fields should be fine - it seems superfluous to have the bit fields.
Another way of doing it (though it might be considered evil by database people ^^) is to default them to 0 and add in an ID 0 data row in both bugtrack and database tables with a name of "None"... when you do the reports, you'll have to do some more work unless you present the "None" values as they are as well with a neat percentage...
To answer the edited question-
Yes, the fields should be combined, with NULL meaning that the application doesn't have a database (or bug tracker).

Do 1 to 1 relations on db tables smell?

I have a table that has a bunch of fields. The fields can be broken into logical groups - like a job's project manager info. The groupings themselves aren't really entity candidates as they don't and shouldn't have their own PKs.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Is there anything I should watch out for when I do this? Is this just a poor choice?
I can see that maybe my queries will get more complicated with all the extra joins but that can be mitigated with views right? If we're talking about a table with less than 100k records is this going to have a noticeable effect on performance?
Edit: I'll justify the non-entity candidate thoughts a little further. This information is entered by our user base. They don't know/care about each other. So its possible that the same user will submit the same "projectManager name" or whatever which, at this point, wouldn't be violating any constraint. Its for us to determine later on down the pipeline if we wanna correlate entries from separate users. If I were to give these things their own key they would grow at the same rate the main table grows - since they are essentially part of the same entity. At no pt is a user picking from a list of available "project managers".
So, given the above, I don't think they are entities. But maybe not - if you have further thoughts please post.
I don't usually use 1 to 1 relations unless there is a specific performance reason for it. For example storing an infrequently used large text or BLOB type field in a separate table.
I would suspect that there is something else going on here though. In the example you give - PmFirstName - it seems like maybe there should be a single pm_id relating to a "ProjectManagers" or "Employees" table. Are you sure none of those groupings are really entity candidates?
To me, they smell unless for some rows or queries you won't be interested in the extra columns. e.g. if for a large portion of your queries you are not selecting the PmFirstName columns, or if for a large subset of rows those columns are NULL.
I like the smells tag.
I use 1 to 1 relationships for inheritance-like constructs.
For example, all bonds have some basic information like CUSIP, Coupon, DatedDate, and MaturityDate. This all goes in the main table.
Now each type of bond (Treasury, Corporate, Muni, Agency, etc.) also has its own set of columns unique to it.
In the past we would just have one incredibly wide table with all that information. Now we break out the type-specific info into separate tables, which gives us much better performance.
For now, to group them, the fields have prefixes (PmFirstName for example) but I'm considering breaking them out into multiple tables with 1:1 relations on the main table.
Create a person table, every database needs this. Then in your project table have a column called PMKey which points to the person table.
Why do you feel that the group of fields are not an entity candidates? If they are not then why try to identify them with a prefix?
Either drop the prefixes or extract them into their own table.
It is valuable splitting them up into separate tables if they are separate logical entities that could be used elsewhere.
So a "Project Manager" could be 1:1 with all the projects currently, but it makes sense that later you might want to be able to have a Project Manager have more than one project.
So having the extra table is good.
If you have a PrimaryFirstName,PrimaryLastName,PrimaryPhone, SecondaryFirstName,SecondaryLastName,SEcondaryPhone
You could just have a "Person" table with FirstName, LastName, Phone
Then your original Table only needs "PrimaryId" and "SecondaryId" columns to replace the 6 columns you previously had.
Also, using SQL you can split up filegroups and tables across physical locations.
So you could have a POST table, and a COMMENT Table, that have a 1:1 relationship, but the COMMENT table is located on a different filegroup, and on a different physical drive with more memory.
1:1 does not always smell. Unless it has no purpose.

Resources