SQL Server: Friends class "facebook like" - sql-server

I need that 1 User can be friend of other user.
So I think that I can use 2 table:
table User (Id, Name, Date, Etc.)
table Friends (UserId, Friend_UserId)
With these tables, I can match that 1 user is friend of other user.
The problem is that for 1000 user, there are 1000*1000 = 1.000000 or rows.
I don't know how much row can be stored in SQL Server, but my question is
what happened for 1 milion of users? SQL Server go in crash?
Anyone can share experience about Friends-Class (in the OOP concept I remember that this was the name)!

in your example 1000*1000 means every user is friends with everyone. Why store the associations at all?
Is that really going to be the case? Or are you just testing the theoretical limit?
And are you going to store two copies of every association? Say user 1 and User 2 are friends. Are you going to have a table with both of these rows:
UserId, Friend_UserId
1 2
2 1
That would double the amount of data needed. You should make it a rule that the lower of two ID's is always in one column.
And of course, make sure you index both columns.

You can store 1000 users and friend on a server, you can store 10000 users and friends on a server, you can store maybe 1M users and friends on a server, but for sure you won't be able to store 100M users and friends and have any decent latency for responses that write/read the DB. With social web-scale apps there is simply no single box that can handle a successful app. This is why the name of the game is scale-out and sharding. You can choose to and get real and ignore the problem for now. Or you can choose to design a for it. Also, see Sharding With SQL Azure and is worth reading into other alternatives.

One of the many reasons behind a good SQL server is specifically holding, and querying, many rows. There is no problem (beyond maybe physical server resources, such as HDD space) that will prevent 1,000,000 rows in a table.

but my question is what happened for 1 milion of users? SQL Server go in crash?
Yes, if you run it on your mobile phone.
Run it on apprpriate hardare nad it can eashily handle hundreds iof billions of rows in a table. Data warehouse projects do that all the time.

Related

database design for large number of users

which one is better a or b:
a). 7 tables for each user e.g. user7 messages, user7mail etc.In this case if we have 1000 users the there will be 7000 tables.
b). 7 tables e.g. messages, mails etc. all the messasges or mails of every usr will be on same table.
in this case for 1000 users we have only 7 tables.
In most cases, on modern hardware and with reasonable tuning, your database should be able to support tens of millions of records without too much pain, as long as your data really is relational. If you're searching for text, or storing hierarchical data, or storing documents, or running reports, there are alternative options (e.g. NoSQL).
Where at all possible, stick with the orthodox way of using relational databases; that means normalization, query tuning, using caches and throwing hardware at the problem.
Only once you've proven you have a performance problem is it worth looking at more exotic solutions. Within RDBMS world, that might mean partitioning the data (sorta kinda similar to your "table per user" idea). Alternatively, you might jump to NoSQL.
The problems with your "table per user" strategy is that you gain almost no benefit when querying by index (on a modern RDBMS, searching a table with 1 row or a table with million rows when hitting the index makes almost no difference for finding the data). For actions that don't hit the index, you should see a decent gain - but that's usually a sign you're not really relational in the first place...
It makes developing the client application rather error prone, and more complicated than it needs to be, especially when creating moderately complex SQL queries (e.g. multi-table joins) - and tuning those queries will become much harder as a result. You won't be able to use the tools available to manage database queries (e.g. ORM tools), as these are all based on the "standard" relational model.
The biggest problem is changing the database - if you have to add an attribute to "message", you have to repeat that change over 7000 tables. You'll either spend a lot of time writing custom database management scripts, or have a human being repeat the same thing thousands of times (and make hard-to-spot mistakes).
Case B will be much better, just make sure that your users have a user_id type field that increments automatically, and link your tables together via that ID e.g.
user_id email
1000 hello
This will improve lookup speed because you do not have to iclude functionality to choose a specific piece of data from a search of 1000's of tables (in this case it would be searching columns of tables until it found the right table with the right column, it would be ludicrous)
but if you are searching a specific table (e.g. you only need messages) only 1 table will be included in the lookup, much faster and easier to manage all the tables at an admin level.
but and even better idea would be 1 table with several columns, say a 'communications' table which could be like
user_id email messages
1000 hello hi

one large sql server table of several smaller ones?

I have some design question to ask.
Suppose I'm having some TBL_SESSIONS table where I keep every logged-in user-id and some Dirty-flag (indicating whether the content he has was changed since he got it).
Now, this table should be periodically be asked every few seconds by any signed-in user about that Dirty-flag, i.e., there are a lot of read calls on that table, and also this Dirty-flag is changed a lot.
Suppose I'm expecting many users to be logged in at the same time. I was wondering if there is any reason in creating like say 10 such tables and have the users distributed (say according their user-ids) between those table.
I'm asking this from two aspects. First, in terms of performance. Second, in terms of scalability.
I would love to hear your opinions
For 1m rows use a single table.
With correct indexing the access time should not be a problem. If you use multiple tables with users distributed across them you will have additional processing necessary to find which table to read/update.
Also, how many users will you allocate to each table? And what happens when the number of users grows... add more tables? I think you'll end up with a maintenance nightmare.
Having several tables makes sence if you have really lots of records in it, like billions, and high load that your server cannot handle. This way you can perform sharding -> split one table to several on different servers. E.g. have first 100 millions of records on server A (id 1 to 100 000 000), next 100 million on server B (100 000 001 to 200 000 000) etc. Other way - is having same data on different and querying data through some kind of balancer, wich may be harder (replication isn't key feature of RDBMS, so it depends on engine)

PHP Geek needs Database Design Assistance

Like most developers I think I am always striving to create the most optimal code and database schemas.
However - Ive got the feeling, that Im over engineering my database schema that I want to create.
I have a web app that, in a short space of time, will hold a lot of users. The users are in the form of customers, suppliers, system users. Its in an industry where it likely to grow rapidly.
In previous schemas I have those users separated in different tables.
However, I am now thinking of going down the route of having one table called: PEOPLE.
There will be these tables:
People,
Contact Details,
Residences
They are related via PivotTables ie:
PivotContacts
PivotResidences.
My Question is this considered good/bad design.?
Am I over thinking, over engineering a simple setup.
The table People will grow exponentially and will hold ALOT of data - and other tables will relate to it.
I would really welcome opinions.
Will my design scale to 100 thousand records and maintain moderate speed.? * will initially start with 1000 records and will likely grow to approx 100,000 in 1 year.
For users that can log-in, and maybe are traced (last login, failed password retries) it is optimal to have a small table and maybe a separate table for writing (distinction between reading and writing data).
Any table with people in general has a tendency to collect a tremendous number of fields. Functional distinctions kept in different tables keep the data tidy, indexing on a suppliers table is nicer/maybe more optimal, as are changes to supplier data. SQL JOINs are manageable, and could be done with SQL views.
So I would go for a thin base table People and 1:1 tables SupplierPeople, SystemUsingPeople and so on. And consider which changes do happen: how often tables are updated, inserted into, being read.
Also consider having to modify the database scheme, adding a field.
If you're only worried about the scalability of your solution, 100K records is not a particularly large number, subject to some (important) assumptions.
Modern database software (I assume you're going to use MySQL as you say you're a PHP hand) running on modern hardware can easily handle databases with millions of records, as long as you have a well-designed table layout, and can use indices.
Your design - linking "people" to "contacts" and "residences" can use primary/foreign keys to join; that should easily scale to your requirements.
It's worth considering the likely queries you're going to be running, though - I'm guessing you will need to be able to search "people" by name, or by address, or by city, or by last contact date etc. This suggest you may need free text searching - once you get to large numbers of records, using where name like '%Jones%' can be slow.
You may also want to consider archiving/history strategies - do you need to store the history of someone's residences (so you can find out where they lived when they placed the order)?

Database and design assistance for large number of simple records

I'm hoping to get some help choosing a database and layout well suited to a web application I have to write (outlined below), I'm a bit stumped given the large number of records and fact that they need to be able to be queried in any manner.
The web app will basically allow querying of a large number of records using any combination of criteria that make up the records, date is the only mandatory item. A record consists of only eight items (below), but there will be about three million new records a day, with very few duplicate records. Data will be constantly inserted into the database real time for the current day.
I know the biggest interest will be in the last 6 months -> 1 years worth of data, but the rest will still need to be available for the same type of queries.
I'm not sure what database is best suited for this, nor how to structure it. The database will be on a reasonably powerful server. I basically want to start with a good db design, and see how the queries perform. I can then judge if I'd rather do optimizations or throw more powerful hardware at it. I just don't want to have to redo the base db design, and it's fine initially if we're doing a lot of optimizations we have time but not $$$.
We need to use something open source, not something like oracle. Right now I'm leaning towards postgres.
A record consists of:
1 Date
2 unsigned integer
3 unsigned integer
4 unsigned integer
5 unsigned integer
6 unsigned integer
7 Text 16 chars
8 Text 255 chars
I'm planning on creating yearly schemas, monthly tables, and indexing the record tables on date for sure.
I'll probably be able to add another index or two after I analyze usage patterns to see what the most popular queries are. I can do lots of tricks on the app site as far as caching popular queries and what not, it's really the db side I need assistance with. Field 8 will have some duplicate values so I'm planning on having that column be an id into a lookup table to join on. Beyond that I guess the remaining fields will all be in one monthly table...
I could break it into weekly tables i suppose as well and use a view for queries so the app doesn't have to deal with trying to assemble a complex query....
anyway, thanks very much for any feedback or assistance!
Some brief advice ...
3 million records a day is a lot! (At least I think so, others might not even blink at that.) I would try to write a tool to insert dummy records and see how something like Postgres performs with one months worth of data.
Might be best to look into NoSQL solutions, which give you the open source + the scalability. Look at Couchbase and Mongo to start. If you are keeping a months worth of data online for real time querying, I'm not sure how Postgres will handle 90 million records. Maybe great, but maybe not.
Consider having "offline" databases in whatever system you decide on. You keep the real time stuff on the best machines and it's ready to go, but you move older data out to another server that is cheaper (read: slower). This way you can always answer queries, but some are faster than others.
In my experience, using primarily Oracle with a similar record insert frequency (several ~billion row tables), you can achieve good web app query performance by carefully partitioning your data (probably by date, in your case) and indexing your tables. How exactly you approach your database architecture will depend on a lot of factors, but there are plenty of good resources on the web for getting help with this stuff.
It sounds like your database is relatively flat, so perhaps another database solution would be better, but Oracle has always worked well for me.

database performance

Say there is a website with 100,000 users each has up to 1000 unique strings attached to them so that there are maximum 100,000,000 strings in total.
Would it be better to have 1 table with each string being one record along with it's owner's id. So that you end up with 1 table with 100,000,000 records with 2 fields (text and user id).
Or have 100,000 tables, one table for each user and the table's name is the user's id. and then 1000 records in each table, with just one field (the text).
Or instead of storing the strings in a database (there would be a character limit about the length of an SMS message) just store link to text files where there are 100,000,000 text files in a directory and each file has a unique name (random numbers and/or letters) and contains one of the strings? (or where each user has a directory and then their strings are in that directory?)
Which would be the most efficient option, the directory and database and then which sub option of those would be the most efficient?
(this question is obviously theoretical in my case, but what does a site like twitter do?)
(by efficiency I mean using the least amount of resources and time)
Or have 100,000 tables
For the love of $DEITY, no! This will lead to horrible code - it's not what databases are designed for.
You should have one table with 100,000,000 records. Database servers are built to handle large tables, and you can use indexes and partitioning etc to improve performance if necessary.
Option #1
It would be easier to store one table with a user id and the text. It would not be more efficient to create a table for every user.
Though in practice you would want something like a Mongo sharded cluster instead of a lone server running MySQL.
You'd have one table, with indexes on the USER_ID.
For speed, you can partition the table, duplicate it, use caching, cloud, sharding, ...
Please consider NoSQL databases: http://nosql-database.org/
Definitely one table, and fill with record based on key. OS will crawl with a directory structure of 100,000 file names to sort through... the directory mgmt alone will KILL your performance (from the OS level)
It depends on how much activity the server has to handle.
A few month ago we build a system that indexed ~20 million Medline article abstracts which each are longer than your twitter message.
We put the stuff in a single lucene index that was ~40GB big.
Even through we had bad hardware (2 GB Ram and no SSD drives - poor interns) we were able to run searches for ~3 million terms in a few days against the database.
A single table or (lucene index) should be the way to go.

Resources