Database design for a social networking site

Database design for a social networking site - database

What are the Tables that would be present in a social networking site (ex: Twitter).
I have a users table as of now. How to keep track of followers and people I do follow?
Should I maintain a separate table for followers and people I follow?
What are the columns which would be there in those tables?
Please don't think this as Subjective/Off topic. As I am a beginner, I thought experts can guide me to get a good DB design?

Try having a look at Database Answers in particular the data models. They have several different designs for various systems. This one is for a social networking site which may give you an idea of what's required.
You may want to search on SO for other social network database questions. I found this one that had a link to flickr showing a schema which appears to be from Facebook.
Your database design will be based around your system requirements. Without knowing exactly what you are trying to achieve, it is difficult to give you the best design.

you can use this Messenger Database Design Concept: Messenger DB

You can create a separate table for follower/ followed relationships. So, when x follow y, create an entry with follower_id = x.id followed_id = y.id.
You can query the relationship table to look for all the users x has relations with by select * from relationships where follower_id = x.id or vice versa.
When/if x un-follow y, you just have to delete the entry you originally created.

Related

How to persist data in microservices?

I am getting started in microservices architectures and I have a couple of questions about the data persistence and databases.
So my understanding is each microservice has it's own database (not necessarily, but usually). But given that case, consider a usual social media platform with users, posts and comments. There will be two microservices, a user's microservice and a posts' microservice. The user's database have a users table and the posts' database has posts and comments tables.
My question is on the posts microservice, because each post and comment has an author, so usually we would create the foreign key pointing to the user's table, however this is in a different database. What to do then? From my perspective there are 2 options:
Add the authorId entry to the table but not the foreign key constrain. If so, what would happen in the application whenever we retrieve that user's data from the user's microservice using the authorId and the user's data is gone?
Create an author's table in the posts' database. If so, what data should that table contain other than the user's id?
It just doesn't feel right to duplicate the data that is already in the user's database but it also doesn't feel right to use the user's id without the FK constraint.

One thing to note, data growth is quite different
Users -> relatively static data.
Posts & Comments -> Dynamic and could be exponentially high compared to users data.
Two microservices design looks good. I would prefer option-1 from your design.
Duplication is not bad, In normal database design this is normal to have "Denormalization" for better read performance. This is also helping in decoupling from users table , may help you to choose different database if require. some of your question what if users data is missing and posts is available, this can be handle with business logic and API design.

How does Facebook (or how would you) partition an incredibly large database of friends?

How would you store 10 billion friends where a friend can be anyone's friend? The simplest solution is to create a database, a table called Person and create a many-to-many association between Person and Person.
But this would not scale properly. This data would need to be partitioned across many databases around the world so the load can be properly distributed.
As a software developer who's kind new to database development, I'm curious how the SO community would solve this problem.

It's likely they don't use a relational database to store this information. It's more likely some flavor of NoSQL, possibly a Graph Database.
It's possible there are blogs (from facebook or third parties) that discuss how it is actually done--the above is just my own speculation based on a basic understanding of these kinds of data stores.

Graph Database Design Methodologies

I want to use a graph database for a web application (involving a web of Users, Posts, Comments, Votes, Answers, Documents and Document-Merges and some other transitive relationships on Users and Documents). So I start asking myself if there is something like a design methodology for Graph Databases, i.e. a kind of analogon to the design principles recommended for Relational Databases (like those normal forms)?
Example questions (of many questions arising):
Is it a good idea, to create a Top-Node Users, having relationships ("exist") on any User-Node in the Database?
Is it a good idea to build in version management (i.e. create relationships (something like "follows")) pointing to updated versions of a Document / Post in a way that going back this relationship means watching the changes the document went through.
etc...
So, do we need a Graph Database Design Cookbook?

The Gremlin User Group (http://tinkerpop.com/) and Neo4j User Group (https://groups.google.com/forum/?fromgroups#!forum/neo4j) are good places to discuss graph-database modeling.
You can create supernodes such as "Users," but it may be better and more performant to use indexes and create an index entry for each user with a key=element_type, value="user", id=user_node_id.
A "follows" relation is often used for people/friends like on Facebook and Twitter so I wouldn't use that for versioning. You can build a versioning system into to Neo4j that timestamps each entry and use a last-write wins algorithm, and there are other database systems like Datomic that have this built in.
See Lightbulb's model (https://github.com/espeed/lightbulb/blob/master/lightbulb/model.py) for an example blog model in Bulbs/Python (http://bulbflow.com).

How would be data stored at database in case of social networking site?

In social networking site, a person has friends/followers. There is a chain of relation. How would be data stored at database in this scenario? This is a very huge information, still result of a query comes back very fast on these sites.
Is it possible that someone explain the relations between various entities? What does make the search result so much fast?
What type of algorithm is implemented?If possible give me example.

This would seem like a MANY-TO-MANY Situation.
Something like, where UserRelations.UserID and UserRelations.UserRelatedID should be foreign keys to Users
Users
- UserID
- UserName
UserRelations
-UserID
-UserRelatedID
-RelationshipType

Facebook developed a custom NoSQL database for their inbox search, which has since been open-sourced. http://en.wikipedia.org/wiki/Apache_Cassandra.
There are some interesting topics related to your question on facebook's engineering page - http://www.facebook.com/Engineering.

Help designing database for a twitter web application

I'm trying to create a small website in ASP.NET MVC that uses twitter. I want to be able to pull some information about twitter users and store it in a database, which I will update periodically.
I am using the following tables:
Users
user_id - uses the twitter id (int)
twitter_name - nvarchar(255)
last_updated - datetime
TwitterData
user_id
date
num_tweets
num_favorites
num_lists
Unfortunately I'm not really good with databases, so is this a good design? Can I use one table instead?
Thanks in advance,
Sasha

If there is a 1-1 relationship between a record in Users and a record in TwitterData, then you could use a single table. If you were going to have other kinds of data (FacebookData, for example), then you'd keep the two tables, but probably move twitter_name to TwitterData.
Read this for an introduction to Normalization, which will help you get started in designing tables.
http://en.wikipedia.org/wiki/Database_normalization

That looks like a good start to me. You can use the Twitter API as a model for your data schema.
If you are wanting to store normalized data I would go with a separate table for Users and their Tweets.