I have a general question about stroring relations in a database.
Let's take a social example: followers. I want to store that Friend A (id 1) is following Friend B (id 5).
I do this with a many-to-many relation in the database.
inserting:
table flw (friend,following,relation_id): (1,5,1)
And so we continue.. with a million relations from friends A,B,C,D,...
Now my question is, when I need to fetch all relations that Friend A (or B,C,...) is following.. it needs to get trough million of records. Is this still effective ? I can't image facebook goes trough a billion of rows to get my friends list.
The solution is probably to use more than one database. But is this (millions of many-to-many-relations) the (only) correct way to store what's in my example ?
Related
I have recently started designing database for one of my project. I am confused on one simple question "More Rows vs More Tables". I am not experienced enough to answer this question. Any help on this will be appreciated. Here is the scenario:
Scenario
I Have a Company. Company will have many Users, Vehicles.
More Rows:
Should I have 1 table for user and vehicle with reference to COMPANY_ID. Obviously over time it will have a lot of records. I have to use GUID as ID because of the requirement. So if it has too many records, I think it will effect the searching operation as well.
More Tables:
Should I have 2 tables created every time I add a new company with company prefix e.g. I add a new company "Tesla", table names will be like TESLA_USER, TESLA_VEHICLES. Obviously over time number of tables will increase a lot.
My concern is which is more efficient way? More Rows or More Tables?
Thank you
Cheers
D
You can create a table for the Companies, a table for users and a table for vehicles in which you put all your data. Then you add two joining tables who only stores the links between companies and users and companies and vehicles.
Example
Recently I encountered an application, Where a Master Table is maintained which contain the data of more than 20 categories. For e.g. it has some categories named as Country,State and City.
So my question is, it is better to move out this category as a separate table and fetching out the data through joins or Everything should be inside a single table.
P.S. In future categories count might increase to 50+ or more than it.
P.S. application based on EF6 + Sql Server.
Edited Version
I just want to know that in above scenario what should be the best approach, one should go with single table with proper indexing or go by the DB normalization approach, putting each category into a separate Table and maintaning relationship through fk's.
Normally, categories are put into separate tables. This conforms more closely with normalized database structures and the definition of entities. In particular, it allows for proper foreign key relationships to be defined. That is a big win for data integrity.
Sometimes categories are put into a single table. This can, of course, be confusing; consider, for instance, "Florida, Massachusetts" or "Washington, Iowa" (these are real places).
Putting categories in one table has one major advantage: all the text is in a single location. That can be very handy for internationalization efforts. To be honest, that is the situation where I have seen this used.
I'm creating this little Access DB, for the HR department to store all data related to all the training sessions that the company organizes for all the employees.
So, I have a Training Session table with information like date, subject, place, observations, trainer, etc, and the unique ID number.
Then there's the Personnel table, with employer ID (which is also the unique table number), names and working department.
So, after that I need another table that keeps a record of all the attendants of each training session. And here's the question, should I use a table for that in the first place? Does it have to be one table for each training session to store the attendants?
I've used excel for quite some time now, but I'm very new to Access and databases (even small ones like this). Any information will be highly appreciated.
Thanks in advance!
It should be one table for persons, one table for trainings, and one for participation/attendance, to minimize (or best: avoid) repetition. Your tables should use primary and foreign keys, so that there are one-to-many relationships between trainings and attendances as well as people and attendances (the attendances table would then have a column referring to the person who attended, and another column referring to the training session).
Google "database normalization" for more detail and variations of that principle (https://en.wikipedia.org/wiki/Database_normalization).
Heres a simple version of the website I'm designing: Users can belong to one or more groups. As many groups as they want. When they log in they are presented with the groups the belong to. Ideally, in my Users table I'd like an array or something that is unbounded to which I can keep on adding the IDs of the groups that user joins.
Additionally, although I realize this isn't necessary, I might want a column in my Group table which has an indefinite amount of user IDs which belong in that group. (side question: would that be more efficient than getting all the users of the group by querying the user table for users belonging to a certain group ID?)
Does my question make sense? Mainly I want to be able to fill a column up with an indefinite list of IDs... The only way I can think of is making it like some super long varchar and having the list JSON encoded in there or something, but ewww
Please and thanks
Oh and its a mysql database (my website is in php), but 2 years of php development I've recently decided php sucks and I hate it and ASP .NET web applications is the only way for me so I guess I'll be implementing this on whatever kind of database I'll need for that.
Your intuition is correct; you don't want to have one column of unbounded length just to hold the user's groups. Instead, create a table such as user_group_membership with the columns:
user_id
group_id
A single user_id could have multiple rows, each with the same user_id but a different group_id. You would represent membership in multiple groups by adding multiple rows to this table.
What you have here is a many-to-many relationship. A "many-to-many" relationship is represented by a third, joining table that contains both primary keys of the related entities. You might also hear this called a bridge table, a junction table, or an associative entity.
You have the following relationships:
A User belongs to many Groups
A Group can have many Users
In database design, this might be represented as follows:
This way, a UserGroup represents any combination of a User and a Group without the problem of having "infinite columns."
If you store an indefinite amount of data in one field, your design does not conform to First Normal Form. FNF is the first step in a design pattern called data normalization. Data normalization is a major aspect of database design. Normalized design is usually good design although there are some situations where a different design pattern might be better adapted.
If your data is not in FNF, you will end up doing sequential scans for some queries where a normalized database would be accessed via a quick lookup. For a table with a billion rows, this could mean delaying an hour rather than a few seconds. FNF guarantees a direct access lookup path for each item of data.
As other responders have indicated, such a design will involve more than one table, to be joined at retrieval time. Joining takes some time, but it's tiny compared to the time wasted in sequential scans, if the data volume is large.
I have a SQL Server database with two table : Users and Achievements. My users can have multiple achievements so it a many-to-many relation.
At school we learned to create an associative table for that sort of relation. That mean creating a table with a UserID and an AchivementID. But if I have 500 users and 50 achievements that could lead to 25 000 row.
As an alternative, I could add a binary field to my Users table. For example, if that field contained 10010 that would mean that this user unlocked the first and the fourth achievements.
Is their other way ? And which one should I use.
Your alternative way isn't a very good approach at all. Not only is it not queryable (how many people unlocked achievement #10?), but it means nothing. Plus, what are you going to do if you add 5 more achievements? Update all the previous users to add "00000" to the end of their "achievements" column?
There is nothing wrong with the associative table as long as you index it properly. Using that approach the data is infinitly queryable and - perhaps more importantly - makes sense!