Database Design for Facebook "likes" - database

New to database design and I was wondering how to efficiently design something like Facebook likes with future scalability in mind.
Let's say you have 3 tables: users, photos and albums.
Let's say a user can like either a photo or an album.
Should I use 1 table for both types of likes?
This would probably mean it would have an user_id, like_type(0-photo, 1-album etc), like_value(the id value of whatever content it is, whether it is photo_id or album_id)?
or have 2 different tables for each likes (ex. photos_likes and albums_likes)?
which would only contain user_id and photo/album_id
I want to make sure that the database design is clean and semi-scaleproof whether we add many more objects in the future(videos, comments, notes, etc) or have a ton of likes.
Thanks!

You could try a inherited table approach see implementing table inheritence for more indepth detail.
But essentially it works just like inheritence in code, you have a base table 'Like' and then tables which 'inherit' from it 'CommentLike', 'PhotoLike' etc.
See attached diagram for a quick mockup.

Two different tables. This way if you ever have an object that you want to add likes to later you can just make a new table "object_likes" and store the likes there.
If you wanted to store them all in one table, you would need a type table, which would store all the types of objects, and in your like table you would have to reference the type_id. This would let you add types later.
To me the first method is much better.

Related

Database design for classified ad item specification

I'm working on a classified ads site with 12 categories. E.g. category vehicles has items cars, bikes, Commercial Vehicles and spare parts. The following is a flow diagram for posting an ad:
I need to show the specification in the Form Filled section of the above image to the users in dropdown lists in the form when they are posting an advertisement. The car specification will be its color,engine,fuel type.
The ERD is below :
How should this issue be tackled, what are the best practices and is the current design going along the right lines?
On the whole this looks ok. Here are some observations:
likes.iker_id should point at users.id? Just trying to understand your model to start.
I would probably change the pics table to be one pic per row and then add an ordinal for ordering.
One question here is how you intend to look at your graph model. As it is, you might have a graph that could be traversed easily to a depth, a couple deep. I assume you are doing this to recommend ads. If so, I think this is sufficient. If not it would be good to further discuss which rdbms you are targetting.
Hope this helps:
In a simplified case, you will need some extra tables.
So, you are trying to be able to have different specifications for different items in your categories? Or, in other words, it is like having different attributes for different types of products in an e-commerce website.
If that the problem you are tackling, then you should look into the Entity–Attribute–Value (EAV) model that is how the problem is solved. By the way, one of the most popular open source e-commerce engines uses it as well.
i agree look at EAV models...
for some other tables, you have many normalization issues - for example:
you should have a separate address table (not part of the ad)
you should have a picture table (and link those to the ads with another table)
you should have a person table - and link that to the ad as 'owner'
the idea of 'favorite' should also be in this person->ad relationship table as a role or type column

Cakephp workaround for model inheritance relationship

From my understanding, cakephp doesn't support database inheritance relationship. However, I want to create a database with different type of Users.
In this case, there are three types of Users: Seller, Customer, and Administrator. Every users should have basic User information such as password, username, etc.
However, each types of users will have its own unique set of datas. For example, seller may have inventory_id while customer may have something like delivery_address, etc.
I have been thinking of creating a workaround to this problem without destroying cakephp convention. I was going to create three additional foreign keys, admin_id, seller_id and customer_id, inside User table, which links to other table. However, knowing that this is an IS-A relationship not HAS-A, I would have to make sure that two of the ids are NULL value. Therefore, this workaround seems ugly to me..
Is there any other simpler, better approach?
For this type of database structure I would probably look at adopting an Entity-Attribute-Value model. This would mean your customer may have a delivery_address and your user may have an inventory_id but as far as your relationship in Cake is concerned your both your user and customer would just have an attribute_id ... you can then create another table that stores what type of attributes are available.
It it's simplest form, your user and customer would be attached to an *attribute_lookup* or *attribute_link* table by a hasMany (probably) relationship. That attribute_lookup/link table would be connected by a belongsTo/hasOne relationship to the actual Attribute Type and Attribute Value models.
Providing that you normalise your tables correctly, you can stick well within Cake relationship conventions.
You can read more about EAV here.
I have been thinking about this problem for some time now, and I have eventually got around to build a solution for it. What I came up with is a new ORM that can be used on top of CakePHP.
It sort of works as CakePHP 3.0 with entities that represent an id/model, but it is much more advanced and supports multi table inheritance and single table inheritance.
Check it out: https://github.com/erobwen/Cream

database table design thoughts . .

I have a database structure issue I am looking for some opinions on.
Let's say there is a scenario where users will use an application to request materials.
There is the need to track who the requester is.
There are three possible "types" of requesters. An individual (Person), a Department, and the Supplier supplying the materials themselves.
In addition the Supplier object needs to be related as the Supplier as well.
So the idea is in the Request table there is a RequestedByID FK. But the related requester has such a different structure for the data for each that it would require a completely denormalized table to related back to if it were made just a single table (people have different properties than departments, and suppliers).
I have some ideas on how I might handle this but thought the SO community would have some great insight.
Thanks for any and all help.
EDIT:
pseudo structure:
Request
RequestID
RequesterID
Department
DepartmentID
DepField1
DepField2
Person
PersonID
PersonField1
PersonField2
Supplier
SupplierID
SuppFiel1
SuppField2
Department, Person, and Supplier all have separate tables because they differ in their properties quite a bit. But each of them can serve as the Requester of a Request (RequesterID). What is the best way to accomplish this without one (denormalized table) full of the different possible requesters?
Hope this helps. . .
You need what is in ER modeling know as inheritance (aka. category, subtype, generalization hierarchy etc.), something like this:
This way, it's easy to have different fields and FKs per requester kind, while still having only one REQUEST table. Essentially, you can varry the requester without being forced to also vary the request.
There are generally 3 ways to represent inheritance in the physical database. What you have tried is essentially the strategy #1 (merging all classes in single table), but I'd recommend strategy #3 (every class in separate table).
You could have two different IDs: RequesterID and RequesterTypeID. RequesterTypeID would just be 1, 2, or 3 for Person, Department, and Supplier, respectively, and RequesterTypeID paired with RequesterID would together make a multi-attribute primary key.
What Jack Radcliffe suggested is probably the best option. So I'd just add an alternative option:
You might also consider having 3 requests tables... One for ppl requests, one for suppliers requests, and one for departments requests... So you don't need to explicitly store the RequesterTypeID, since you can deduce it from the name of the table... You can then create the table Jack Radcliffe as a view, by "uniting" all the 3 individual tables...
Also, if you implement Jack Radcliffe approach, you can create 3 views to simulate the 3 tables I've mention... So then you can use whichever table/view is best for each situation, and if you want to change from approach A to B it's really easy too...
What I like about Jack Radcliffe's thought is if you store them in a separate table or make the sql statement generic to handle any number passed in by the application, they can be expanded e.g. manufacture, entity, subsidiary, etc
However, you choose the expansion will entail overhead.

Database design rules to follow for a programmer

We are working on a mapping application that uses Google Maps API to display points on a map. All points are currently fetched from a MySQL database (holding some 5M + records). Currently all entities are stored in separate tables with attributes representing individual properties.
This presents following problems:
Every time there's a new property we have to make changes in the database, application code and the front-end. This is all fine but some properties have to be added for all entities so that's when it becomes a nightmare to go through 50+ different tables and add new properties.
There's no way to find all entities which share any given property e.g. no way to find all schools/colleges or universities that have a geography dept (without querying schools,uni's and colleges separately).
Removing a property is equally painful.
No standards for defining properties in individual tables. Same property can exist with different name or data type in another table.
No way to link or group points based on their properties (somehow related to point 2).
We are thinking to redesign the whole database but without DBA's help and lack of professional DB design experience we are really struggling.
Another problem we're facing with the new design is that there are lot of shared attributes/properties between entities.
For example:
An entity called "university" has 100+ attributes. Other entities (e.g. hospitals,banks,etc) share quite a few attributes with universities for example atm machines, parking, cafeteria etc etc.
We dont really want to have properties in separate table [and then linking them back to entities w/ foreign keys] as it will require us adding/removing manually. Also generalizing properties will results in groups containing 50+ attributes. Not all records (i.e. entities) require those properties.
So with keeping that in mind here's what we are thinking about the new design:
Have separate tables for each entity containing some basic info e.g. id,name,etc etc.
Have 2 tables attribute type and attribute to store properties information.
Link each entity (or a table if you like) to attribute using a many-to-many relation.
Store addresses in different table called addresses link entities via foreign keys.
We think this will allow us to be more flexible when adding, removing or querying on attributes.
This design, however, will result in increased number of joins when fetching data e.g.to display all "attributes" for a given university we might have a query with 20+ joins to fetch all related attributes in a single row.
We desperately need to know some opinions or possible flaws in this design approach.
Thanks for your time.
In trying to generalize your question without more specific examples, it's hard to truly critique your approach. If you'd like some more in depth analysis, try whipping up an ER diagram.
If your data model is changing so much that you're constantly adding/removing properties and many of these properties overlap, you might be better off using EAV.
Otherwise, if you want to maintain a relational approach but are finding a lot of overlap with properties, you can analyze the entities and look for abstractions that link to them.
Ex) My Db has Puppies, Kittens, and Walruses all with a hasFur and furColor attribute. Remove those attributes from the 3 tables and create a FurryAnimal table that links to each of those 3.
Of course, the simplest answer is to not touch the data model. Instead, create Views on the underlying tables that you can use to address (5), (4) and (2)
1 cannot be an issue. There is one place where your objects are defined. Everything else is generated/derived from that. Just refactor your code until this is the case.
2 is solved by having a metamodel, where you describe which properties are where. This is probably needed for 1 too.
You might want to totally avoid the problem by programming this in Smalltalk with Seaside on a Gemstone object oriented database. Then you can just have objects with collections and don't need so many joins.

Table "Inheritance" in SQL Server

I am currently in the process of looking at a restructure our contact management database and I wanted to hear peoples opinions on solving the problem of a number of contact types having shared attributes.
Basically we have 6 contact types which include Person, Company and Position # Company.
In the current structure all of these have an address however in the address table you must store their type in order to join to the contact.
This consistent requirement to join on contact type gets frustrating after a while.
Today I stumbled across a post discussing "Table Inheritance" (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server).
Basically you have a parent table and a number of sub tables (in this case each contact type). From there you enforce integrity so that a sub table must have a master equivalent where it's type is defined.
The way I see it, by this method I would no longer need to store the type in tables like address, as the id is unique across all types.
I just wanted to know if anybody had any feelings on this method, whether it is a good way to go, or perhaps alternatives?
I'm using SQL Server 05 & 08 should that make any difference.
Thanks
Ed
I designed a database just like the link you provided suggests. The case was to store the data for many different technical reports. The number of report types is undefined and will probably grow to about 40 different types.
I created one master report table, that has an autoincrement primary key. That table contains all common information like customer, testsite, equipmentid, date etc.
Then I have one table for each report type that contains the spesific information relating to that report type. That table have the same primary key as the master and references the master as well.
My idea for splitting this into different tables with a 1:1 relation (which normally would be a no-no) was to avoid getting one single table with a huge number of columns, that gets very difficult to maintain as your constantly adding columns.
My design with table inheritance gave me segmented data and expandability without beeing difficult to maintain. The only thing I had to do was to write special a special save method to handle writing to two tables automatically. So far I'm very happy with the design and haven't really found any drawbacks, except for a little more complicated save method.
Google on "gen-spec relational modeling". You'll find a lot of articles discussing exactly this pattern. Some of them focus on table design, while others focus on an object oriented approach.
Table inheritance pops up in a few of them.
I know this won't help much now, but initially it may have been better to have an Entity table rather than 6 different contact types. Then each Entity could have as many addresses as necessary and there would be no need for type in the join.
You'll still have the problem that if you want the sub-type fields and you have only the master contact, you'll have to know what table to go looking at - or else join to all of them. But otherwise this is a workable solution to a common problem.
Another possibility (fairly similar in structure, but different in how you think of it) is to simply put all your contacts into one table. Then for the more specific fields (birthday say for people and department for position#company) create separate tables that are associated with that contact.
Contact Table
--------------
Name
Phone Number
Address Table
-------------
Street / state, etc
ContactId
ContactBirthday Table
--------------
Birthday
ContactId
Departments Table
-----------------
Department
ContactId
It requires a different way of thinking of things though - instead of thinking of people vs. companies, you think of the various functional requirements for the task at hand - if you want to send out birthday cards, get all the contacts that have birthdays associated with them, etc..
I'm going to go out on a limb here and suggest you should rethink your normalization strategy (as you seem to be lucky enough to be able to rethink your schema quite fundamentally). If you typically store an address for each contact, then your contact table should have the address fields in it. Alternatively if the address is stored per company then the address should be stored in the company table and your contacts linked to that company.
If your contacts only have one address, or one (or even 3, just not 'many') instance of the other fields, think about rationalizing them into a single table. In my experience having a few null fields is a far better alternative than needing left joins to data you aren't sure exists.
Fortunately for anyone who vehemently disagrees with me you did ask for opinions! :) IMHO you should only normalize when you really need to. Where you are rethinking schemas, denormalization should be considered at every opportunity.
When you have a 7th type, you'll have to create another table.
I'm going to try this approach. Yes, you have to create new tables when you have a new type, but since this table will probably have different columns, you'll end up doing this anyway if you don't use this scheme.
If the tables that inherit the master don't differentiate much from one another, I'd recommend you try another approach.
May I suggest that we just add a Type table. Ie a person has an address, name etc then the student, teacher as each use case presents its self we have a PersonType table that has an entry from the person table to n types and the subsequent new tables teacher, alien, singer as the system eveolves...

Resources