one large sql server table of several smaller ones? - sql-server

I have some design question to ask.
Suppose I'm having some TBL_SESSIONS table where I keep every logged-in user-id and some Dirty-flag (indicating whether the content he has was changed since he got it).
Now, this table should be periodically be asked every few seconds by any signed-in user about that Dirty-flag, i.e., there are a lot of read calls on that table, and also this Dirty-flag is changed a lot.
Suppose I'm expecting many users to be logged in at the same time. I was wondering if there is any reason in creating like say 10 such tables and have the users distributed (say according their user-ids) between those table.
I'm asking this from two aspects. First, in terms of performance. Second, in terms of scalability.
I would love to hear your opinions

For 1m rows use a single table.
With correct indexing the access time should not be a problem. If you use multiple tables with users distributed across them you will have additional processing necessary to find which table to read/update.
Also, how many users will you allocate to each table? And what happens when the number of users grows... add more tables? I think you'll end up with a maintenance nightmare.

Having several tables makes sence if you have really lots of records in it, like billions, and high load that your server cannot handle. This way you can perform sharding -> split one table to several on different servers. E.g. have first 100 millions of records on server A (id 1 to 100 000 000), next 100 million on server B (100 000 001 to 200 000 000) etc. Other way - is having same data on different and querying data through some kind of balancer, wich may be harder (replication isn't key feature of RDBMS, so it depends on engine)

Related

database design for large number of users

which one is better a or b:
a). 7 tables for each user e.g. user7 messages, user7mail etc.In this case if we have 1000 users the there will be 7000 tables.
b). 7 tables e.g. messages, mails etc. all the messasges or mails of every usr will be on same table.
in this case for 1000 users we have only 7 tables.
In most cases, on modern hardware and with reasonable tuning, your database should be able to support tens of millions of records without too much pain, as long as your data really is relational. If you're searching for text, or storing hierarchical data, or storing documents, or running reports, there are alternative options (e.g. NoSQL).
Where at all possible, stick with the orthodox way of using relational databases; that means normalization, query tuning, using caches and throwing hardware at the problem.
Only once you've proven you have a performance problem is it worth looking at more exotic solutions. Within RDBMS world, that might mean partitioning the data (sorta kinda similar to your "table per user" idea). Alternatively, you might jump to NoSQL.
The problems with your "table per user" strategy is that you gain almost no benefit when querying by index (on a modern RDBMS, searching a table with 1 row or a table with million rows when hitting the index makes almost no difference for finding the data). For actions that don't hit the index, you should see a decent gain - but that's usually a sign you're not really relational in the first place...
It makes developing the client application rather error prone, and more complicated than it needs to be, especially when creating moderately complex SQL queries (e.g. multi-table joins) - and tuning those queries will become much harder as a result. You won't be able to use the tools available to manage database queries (e.g. ORM tools), as these are all based on the "standard" relational model.
The biggest problem is changing the database - if you have to add an attribute to "message", you have to repeat that change over 7000 tables. You'll either spend a lot of time writing custom database management scripts, or have a human being repeat the same thing thousands of times (and make hard-to-spot mistakes).
Case B will be much better, just make sure that your users have a user_id type field that increments automatically, and link your tables together via that ID e.g.
user_id email
1000 hello
This will improve lookup speed because you do not have to iclude functionality to choose a specific piece of data from a search of 1000's of tables (in this case it would be searching columns of tables until it found the right table with the right column, it would be ludicrous)
but if you are searching a specific table (e.g. you only need messages) only 1 table will be included in the lookup, much faster and easier to manage all the tables at an admin level.
but and even better idea would be 1 table with several columns, say a 'communications' table which could be like
user_id email messages
1000 hello hi

Database design: storing many large reports for frequent historical analysis

I'm a long time programmer who has little experience with DBMSs or designing databases.
I know there are similar posts regarding this, but am feeling quite discombobulated tonight.
I'm working on a project which will require that I store large reports, multiple times per day, and have not dealt with storage or tables of this magnitude. Allow me to frame my problem in a generic way:
The process:
A script collects roughly 300 rows of information, set A, 2-3 times per day.
The structure of these rows never change. The rows contain two columns, both integers.
The script also collects roughly 100 rows of information, set B, at the same time. The
structure of these rows does not change either. The rows contain eight columns, all strings.
I need to store all of this data. Set A will be used frequently, and daily for analytics. Set B will be used frequently on the day that it is collected and then sparingly in the future for historical analytics. I could theoretically store each row with a timestamp for later query.
If stored linearly, both sets of data in their own table, using a DBMS, the data will reach ~300k rows per year. Having little experience with DBMSs, this sounds high for two tables to manage.
I feel as though throwing this information into a database with each pass of the script will lead to slow read times and general responsiveness. For example, generating an Access database and tossing this information into two tables seems like too easy of a solution.
I suppose my question is: how many rows is too many rows for a table in terms of performance? I know that it would be in very poor taste to create tables for each day or month.
Of course this only melts into my next, but similar, issue, audit logs...
300 rows about 50 times a day for 6 months is not a big blocker for any DB. Which DB are you gonna use? Most will handle this load very easily. There are a couple of techniques for handling data fragmentation if the data rows exceed more than a few 100 millions per table. But with effective indexing and cleaning you can achieve the performance you desire. I myself deal with heavy data tables with more than 200 million rows every week.
Make sure you have indexes in place as per the queries you would issue to fetch that data. Whats ever you have in the where clause should have an appropriate index in db for it.
If you row counts per table exceed many millions you should look at partitioning of tables DBs store data in filesystems as files actually so partitioning would help in making smaller groups of data files based on some predicates e.g: date or some unique column type. You would see it as a single table but on the file system the DB would store the data in different file groups.
Then you can also try table sharding. Which actually is what you mentioned....different tables based on some predicate like date.
Hope this helps.
You are over thinking this. 300k rows is not significant. Just about any relational database or NoSQL database will not have any problems.
Your design sounds fine, however, I highly advise that you utilize the facility of the database to add a primary key for each row, using whatever facility is available to you. Typically this involves using AUTO_INCREMENT or a Sequence, depending on the database. If you used a nosql like Mongo, it will add an id for you. Relational theory depends on having a primary key, and it's often helpful to have one for diagnostics.
So your basic design would be:
Table A tableA_id | A | B | CreatedOn
Table B tableB_id | columns… | CreatedOn
The CreatedOn will facilitate date range queries that limit data for summarization purposes and allow you to GROUP BY on date boundaries (Days, Weeks, Months, Years).
Make sure you have an index on CreatedOn, if you will be doing this type of grouping.
Also, use the smallest data types you can for any of the columns. For example, if the range of the integers falls below a particular limit, or is non-negative, you can usually choose a datatype that will reduce the amount of storage required.

Database design: one large table versus several smaller tables

I have to create a database to store information being sent and received to / from a 3rd party web service portal. There are about 150 fields of information to be sent though I can remove about 50 of those fields by normalising (there are three sets addresses that can be saved in an address table, for example). However, this still leaves a table that could potentially have 100 columns.
I've come up with two ways of handling this though I'm not sure which to use:
1. Have a table with 100 columns and three references to an address table.
2. Break it down into maybe 15-20 separate dedicated tables.
Option 1 seems the quickest as it involves the fewest joins but the idea of a table with 100 columns doesn't feel right.
Option 2 feels better and would break things down in to more managable chunks but it won't save any database space and will increase the number of joins. Pretty much all the columns in the database will have a value and I cannot normalise these columns any further.
My question is, in this situation is it acceptable to have a table with c.100 columns in it or should I try and break it down over several tables for presentation?
Please note: The table structure will not change over the course of it's useage, a new database would be created for a new version of the web service portal. I have no control over the web service data structure.
Edit: #Oded's answer below has made me think a bit more about how the data will be accessed; it will really only be accessed in whole and not in part. I wouldn't for example, need to return columns 5-20 on a regular basis.
Answer: I accepted Oded's answer based on the comments after he posted it helped me make my mind up and I decided to go with option 1. As the data is accessed in full then having one table seems the better solution. If, for example, I regularly wanted to access columns 5-20 rather than the full table row then I'd see about breaking it up into separate tables for performance reasons.
Speaking from a relational purist point of view - first, there is nothing against having 100 columns in a table, if they are related. The point here is that if after normalizing you still have 100 columns, that's OK.
But you should normalize, and in the process you may very well end up with 15-20 separate dedicated tables, which most relational database professionals would agree is a better design (avoid data duplication with the update/delete issues associated, smaller data footprint etc...).
Pragmatically, however, if there is a measurable performance problem, it may be sensible to denormalize your design for performance benefit. The key here - measureable. Don't optimize before you have an actual problem.
In that respect, I'd say you should go with the set of 15-20 tables as an initial design.
From MSDN:Maximum Capacity Specifications for SQL Server :
Columns per nonwide table: 1,024
Columns per wide table: 30,000
So I think 100 columns is ok in your case. And also maybe you need to note(from same link):
Columns per primary key: 16
Of course this is only in the case if need data only as Log for a service.
If after reading from service you need to maintain data -> then normalising seems better...
If you find it easier to "manage" tables with fewer columns, however you happen to define manageability (e.g. less horizontal scrolling when looking at the table data in SSMS), you can break the table up into several tables with 1-to-1 relationships without violating the rules of normalization.

PHP Geek needs Database Design Assistance

Like most developers I think I am always striving to create the most optimal code and database schemas.
However - Ive got the feeling, that Im over engineering my database schema that I want to create.
I have a web app that, in a short space of time, will hold a lot of users. The users are in the form of customers, suppliers, system users. Its in an industry where it likely to grow rapidly.
In previous schemas I have those users separated in different tables.
However, I am now thinking of going down the route of having one table called: PEOPLE.
There will be these tables:
People,
Contact Details,
Residences
They are related via PivotTables ie:
PivotContacts
PivotResidences.
My Question is this considered good/bad design.?
Am I over thinking, over engineering a simple setup.
The table People will grow exponentially and will hold ALOT of data - and other tables will relate to it.
I would really welcome opinions.
Will my design scale to 100 thousand records and maintain moderate speed.? * will initially start with 1000 records and will likely grow to approx 100,000 in 1 year.
For users that can log-in, and maybe are traced (last login, failed password retries) it is optimal to have a small table and maybe a separate table for writing (distinction between reading and writing data).
Any table with people in general has a tendency to collect a tremendous number of fields. Functional distinctions kept in different tables keep the data tidy, indexing on a suppliers table is nicer/maybe more optimal, as are changes to supplier data. SQL JOINs are manageable, and could be done with SQL views.
So I would go for a thin base table People and 1:1 tables SupplierPeople, SystemUsingPeople and so on. And consider which changes do happen: how often tables are updated, inserted into, being read.
Also consider having to modify the database scheme, adding a field.
If you're only worried about the scalability of your solution, 100K records is not a particularly large number, subject to some (important) assumptions.
Modern database software (I assume you're going to use MySQL as you say you're a PHP hand) running on modern hardware can easily handle databases with millions of records, as long as you have a well-designed table layout, and can use indices.
Your design - linking "people" to "contacts" and "residences" can use primary/foreign keys to join; that should easily scale to your requirements.
It's worth considering the likely queries you're going to be running, though - I'm guessing you will need to be able to search "people" by name, or by address, or by city, or by last contact date etc. This suggest you may need free text searching - once you get to large numbers of records, using where name like '%Jones%' can be slow.
You may also want to consider archiving/history strategies - do you need to store the history of someone's residences (so you can find out where they lived when they placed the order)?

SQL Server: Friends class "facebook like"

I need that 1 User can be friend of other user.
So I think that I can use 2 table:
table User (Id, Name, Date, Etc.)
table Friends (UserId, Friend_UserId)
With these tables, I can match that 1 user is friend of other user.
The problem is that for 1000 user, there are 1000*1000 = 1.000000 or rows.
I don't know how much row can be stored in SQL Server, but my question is
what happened for 1 milion of users? SQL Server go in crash?
Anyone can share experience about Friends-Class (in the OOP concept I remember that this was the name)!
in your example 1000*1000 means every user is friends with everyone. Why store the associations at all?
Is that really going to be the case? Or are you just testing the theoretical limit?
And are you going to store two copies of every association? Say user 1 and User 2 are friends. Are you going to have a table with both of these rows:
UserId, Friend_UserId
1 2
2 1
That would double the amount of data needed. You should make it a rule that the lower of two ID's is always in one column.
And of course, make sure you index both columns.
You can store 1000 users and friend on a server, you can store 10000 users and friends on a server, you can store maybe 1M users and friends on a server, but for sure you won't be able to store 100M users and friends and have any decent latency for responses that write/read the DB. With social web-scale apps there is simply no single box that can handle a successful app. This is why the name of the game is scale-out and sharding. You can choose to and get real and ignore the problem for now. Or you can choose to design a for it. Also, see Sharding With SQL Azure and is worth reading into other alternatives.
One of the many reasons behind a good SQL server is specifically holding, and querying, many rows. There is no problem (beyond maybe physical server resources, such as HDD space) that will prevent 1,000,000 rows in a table.
but my question is what happened for 1 milion of users? SQL Server go in crash?
Yes, if you run it on your mobile phone.
Run it on apprpriate hardare nad it can eashily handle hundreds iof billions of rows in a table. Data warehouse projects do that all the time.

Resources