Create an application with only one db table - database

This is an odd question.
Assuming a highly db dependent application such as a forum or chat website. As the joints on huge tables could be expensive. Can we put everything into 1 table and avoid all joints, unions etc. By this, we only query one table and with proper indexes, it would be fast.
I know that nobody wants all eggs in one basket but what is the main logics that the developers avoid this scenario.

Related

Database table creation for large data

I am making a client management application in which I am storing the data of employee , admin and company. In the future the database will have hundreds of companies registered. I am thinking to go for the best approach to database design.
I can think of 2 approaches:
Making all tables of app separately for each company
Storing all data in app database
Can you suggest the best way to do that?
Please note that all 3 tables are linked on the basis of ids and there will be hundreds of companies and each company will have many admin and each admin will have hundreds of employee . What would be the best approach to do with security and query performance
With the partial information you provided, it look like 3 normalized tables is what you need, plus the auxiliar data like lookups and other stuff.
But when you design a database you would need to consider many more point like, security, visibility, client access methods, etc
For example if you want to ensure isolation, and don't allow users to have any visibility to other's data, you could create dynamically a schema per company, create user and access rights for each schema dynamically. Then you'll need support these stuff in the DAL, which in fact will be quite fat.
Another approach for the DAl could be exposing views that always return subsets for one company.
A big reason reason that I would suggest going for the normalized approach is that maintenance will be much easier this way.
From a SQL point of view I don't see any performance advantage having many tables or just 3, efficiency of the indexes, and smart DAL will make the difference.
The performance of the query doesn't much depends on the size of table but it depends more on the indexes you have on that table. so you need to put clustered and non clustered indexes as per your requirement and i can guarantee that up to 10 GB of data you will not face any problem
This is a classic problem shared my most web business services: for discussions of the factors involved, Google "multi-tenant architecture."
You almost certainly want to put all companies into a common set of tables: each data table should reference the company key, and all queries should join on that key, among their other criteria. This allows the best overall performance, and saves you the potential maintenance nightmare of duplicating views, stored procedures and so on hundreds of times, or of having to apply the same structural changes to hundreds of tables should you wish to add a field or a table.
To help assure that you don't inadvertently intermingle data from different customers, it might be useful to do all data access through a validated set of stored procedures (all of which take the company ID as a parameter).
Hundreds of parallel databases will not scale very well: the DB server will constantly be pushing tables and indexes out of memory to accommodate the next query, resulting in disk thrashing and poor performance, as well. There is only pain down that path.
depending on the use-cases of your application there is no "best" way.
Please explain the operations your application will provide so we can get further insight into your problem.
The data to be stored seemed to be structured so a relational database at a first glance would work out well, but stick to the point i marked above.
You have not said how this data links at all or if there are even any links between them. However, at a guess, you need 3 tables.
EmployeeTable
AdminTable
CompanyTable
Each with the required properties in there, without additional information I'm not able to provide any more guidance.

When is a flat DB design acceptable

When is it ok to use a flat DB table design nowadays. Ever? What I mean is when is it ok to abandon the wisdom of relational database design and revert back a flat table structure that incorporates no links, adding extra columns to add more data, when we should be creating a key to another table to store multiple rows.
I'm working on some ideas to discuss with a product management team. When I initially asked the question "Why are all these tables flat in nature" I was told that
"Read centric databases display better performance with a flat table structure."
I struggle with this explanation b/c a flat design present so many barriers to progress down the road.
Thoughts?
"Read centric databases display better performance with a flat table structure." This statement says table won't/rarely be used to insert/update/delete operations. In that case table must be properly indexed to get good performance. Since there won't be any kind of joins so table would be using lot of filters in where clause hence indexing is really important to be used appropriately.
This kind of scenario is usually used in data warehouses. When we designs warehouses, we usually eliminates primary/foreign keys and uses business primary keys. This is because of huge database in wareshouse.
Never.
Whatever problem you think you are going to solve by ignoring relational database theory, you will only create many more intractable problems. Furthermore, the original problem that you attempt to avoid by ignoring relational theory will invariably be based on a misconception anyway.
Short answer: Almost always!
Your website almost never needs conventional database!
After 20 years of working as an IT admin with big and small projects I can say with confidence that over 90% of todays websites do not need DataBase AT ALL.
It's just another layer of obfuscation that most companies and people can do without.
Face the facts people. Most websites out there don't get a single hit in a day so talking about DataBase performance is quite silly when it comes to HUGE majority of websites today (2019).
That means that over 90% of these sites could and should switch to some flat file CMS/CMR like PageKit, Grav or Bludit (It's my personal favorite because of its minimalistic approach. It disdains flatDB and uses ordinary folders to contain articles in HTML files.)
I never did figure out why CMS leaders like WordPress and Joomla insist on complicating their default setup by forcing their users to use DataBase connection and configuration that's often the reason the site malfunctions. If and only when site actually needs some type of DB like for instance if it has many user accounts then DB is warranted. Still, most websites have only a hand-full user accounts.
Many times we see some site down because the DataBase engine is down or can't handle so many simultaneous connections while Apache or NginX web-servers are still up and running.
Don't just follow others blindly. Time to be brave and lead.

Will creating seperate databases in SQL Server give me better performance?

All, I'm a programmer by trade but for this particular project I'm finidng myself being the DBA as well. Here is the scenario I'm faced with:
Web app with anywhere from 400-1000 customers. A customer is a "physical company", each of which has n-number of uers. Each customer (company) has on average 1GB worth of data (total of about 200 million rows). Each company has probably 80% similar data in terms of the type of data stored. The other 20% is custom data that the companies can themselves define (basically custom fields).
I am trying to figure out the best way to scale this on the cheap when you conisder that the customers need pretty good reaction time. For example, customer X might want to grab all records where last name like 'smith' and phone like '555' where as customer Y might want to grab all records where account number equals '1526A'.
Bottom line, performance is key and I'm finding it hard to decide what to index and if that is even going to help me given the fact these guys can basically create their own query through the UI.
My question is, what would you do? Do you think it would be wise to break each customer out into it's own DB? Total DB size at the moment is around 400GB.
It is a complete re-write so I have the fortune of being able to start fresh if needed. Any thoughts, hints would be greatly appreciated.
Bottom line, performance is key and
I'm finding it hard to decide what to
index and if that is even going to
help me given the fact these guys can
basically create their own query
through the UI.
Bottom line, you're ceding your DB performance to the whims of your clients. If they're able to "create their own query", then they're able to "create their own REALLY BAD queries".
So, if you run this in a shared environment (i.e. the same hardware), then customer A's awful table scans can saturate the I/O for everyone else.
If they're on the same database server, then Customer A's scans get to flush all of your other customers data from the data cache.
Basically, the more you "share", the more one customer can impact the operations of other customers. If you give customers the capability to do expensive things, and share much of it, then everyone suffers.
So, the options are a) don't let the customers do silly things or b) keep the customers as separated as practical so that when one does do silly things, the phones don't light up from all of the other customers.
If you don't know "what to index" then you are not offering much control over what the customers can do, and thus the silly thing factor goes way up.
You would probably get quite far by offering several popular, pre-made SQL views that the customers can select from, and then they're limited to simply filtering and possibly ordering the results. Then you optimize around execution of those views.
It's likely that surprisingly few "general" views can cover a large amount of the use cases.
Generic, silly queries can be delegated to a batch process that runs overnight, during off hours, or to a separate machine that doesn't impact transactional performance, such as a nightly snapshot with "everything but todays data" on it. Let them run historic queries against that.
The SO question How to design a multi tenant database has a link to a decent article on the tradeoffs along the spectrum from "shared nothing" to "shared everything". Also, SO has a tag for those kinds of questions; I added it for you.
Creating separate databases on the same server won't help you get better performance. The performance optimisations available to you with multiple databases are just the same as you can achieve with one database.
Separate databases might make sense for administrative reasons - if different backup or availability requirements apply to different customers for example.
It's still probably sensible to build your application so that it can support multiple databases so that you have the option of scaling out over multiple DB servers.
If you have seperate databases the 80% that is the same beciomes almost impossible to keep the same over time. YOu will end up spending far more money for maintenance.
Luckly SQL Server has some options for you. First put the customer sspeicifc information in the same database in a separate schema and the common stuff in a differnt schema(create a common schema and a schema for each client).
Next set up data partitioning by client. This can require the proper hardware to do this effectively.
Now you have one code base for common which will promugate changes to all clients at once and clients are separated for performance using the partitions.

What is the best database design for thousand rows

I'm about to start a Database Design that will simply manage users under companies.
Each company will have a admin area that can manage users
Each company will have around 25.000 users
Client believes to have around 50 companies to start
My main question is
Should I create tables based on Companies? like
users_company_0001 users_company_0002 users_company_0003 ...
as each company will never use "other" users and nothing will need to sum/count different tables in all user_company (a simple JOIN will do the trick, though it's more expensive (time) it will work as having the main picture, this will never be needed.
or should I just create a users table to have (50 x 25000) 1 250 000 users (and growing).
I'm thinking about the first option, though, I'm not sure how would I use Entity Framework on such layout... I would probably need to go back to the 90's and generate my Data Logic Layer by hand.
has it will be a simple call to Store Procedures containing the Company Id
What will you suggest?
The system application will be ASP.NET (probably MVC, I'm still trying to figure this out as all my knowledge is on webforms, though I saw Scott Hanselman MVC videos - seams easy - but I know it will not be that easy as problems will come and I will take more time to fix them), plus Microsoft SQL.
Even though you've described this as a 1-many relationship, I'd still design the DB as many-to-many to guard against a future change in requirements. Something like:
Having worked with a multi-terabyte SQL Server database, and having experience with hundreds of tables over the course of my career with multi-million rows, I can tell you with full assurance that SQL Server can handle a your company and users tables without partitioning. It's always there when you need it, but your worry shouldn't be about your tables - pick the simplest schema that meets your needs. If you want to do something to optimize performance, your bottleneck will almost assuredly be your disks. Don't buy large, slow disks. Get yourself a bunch of small, high RPM disks and spread your data out across them as much as possible, and don't share disks with your logs and your data. With databases, you're almost always better off achieving performance with good hardware, a good disk subsystem, and proper indexing. Don't compromise and over complicate your schema trying to anticipate performance - you'll regret it. I've seen really big databases where that sort of thing was necessary, but yours ain't it.
re: Should I create tables based on Companies?
yes
like
users_company_0001 users_company_0002 users_company_0003
no, like
companyID companyName, contactID
or should I just create a users table to have (50 x 25000) 1 250 000 users (and growing)
yes
I think you should create separate tables for Company and User. Then
a third table to connect the two: CompanyAdmin. Something like:
Company(Company_Id, Company_name, ...)
User(User_Id, User_name, ...)
CompanyAdmin(Company_id, User_id)
This way you can add users and/or companies without affecting the number
of tables you need to manage. It is generally a bad design where you need
to modify the database (ie. add tables) when new data (companies) are added to the system.
With proper indexing, the join costs in a database containing
a few million rows should not be a problem.
Finally, if you ever need to change or record additional information about
Companies, Users or the relationship between them, this setup should
have the least amount of impact on your application.

How would you design your database to allow user-defined schema

If you have to create an application like - let's say a blog application, creating the database schema is relatively simple. You have to create some tables, tblPosts, tblAttachments, tblCommets, tblBlaBla… and that's it (ok, i know, that's a bit simplified but you understand what i mean).
What if you have an application where you want to allow users to define parts of the schema at runtime. Let's say you want to build an application where users can log any kind of data. One user wants to log his working hours (startTime, endTime, project Id, description), the next wants to collect cooking recipes, others maybe stock quotes, the weekly weight of their babies, monthly expenses they spent for food, the results of their favorite football teams or whatever stuff you can think about.
How would you design a database to hold all that very very different kind of data? Would you create a generic schema that can hold all kind of data, would you create new tables reflecting the user data schema or do you have another great idea to do that?
If it's important: I have to use SQL Server / Entity Framework
Let's try again.
If you want them to be able to create their own schema, then why not build the schema using, oh, I dunno, the CREATE TABLE statment. You have a full boat, full functional, powerful database that can do amazing things like define schemas and store data. Why not use it?
If you were just going to do some ad-hoc properties, then sure.
But if it's "carte blanche, they can do whatever they want", then let them.
Do they have to know SQL? Umm, no. That's your UIs task. Your job as a tool and application designer is to hide the implementation from the user. So present lists of fields, lines and arrows if you want relationships, etc. Whatever.
Folks have been making "end user", "simple" database tools for years.
"What if they want to add a column?" Then add a column, databases do that, most good ones at least. If not, create the new table, copy the old data, drop the old one.
"What if they want to delete a column?" See above. If yours can't remove columns, then remove it from the logical view of the user so it looks like it's deleted.
"What if they have eleventy zillion rows of data?" Then they have a eleventy zillion rows of data and operations take eleventy zillion times longer than if they had 1 row of data. If they have eleventy zillion rows of data, they probably shouldn't be using your system for this anyway.
The fascination of "Implementing databases on databases" eludes me.
"I have Oracle here, how can I offer less features and make is slower for the user??"
Gee, I wonder.
There's no way you can predict how complex their data requirements will be. Entity-Attribute-Value is one typical solution many programmers use, but it might be be sufficient, for instance if the user's data would conventionally be modeled with multiple tables.
I'd serialize the user's custom data as XML or YAML or JSON or similar semi-structured format, and save it in a text BLOB.
You can even create inverted indexes so you can look up specific values among the attributes in your BLOB. See http://bret.appspot.com/entry/how-friendfeed-uses-mysql (the technique works in any RDBMS, not just MySQL).
Also consider using a document store such as Solr or MongoDB. These technologies do not need to conform to relational database conventions. You can add new attributes to any document at runtime, without needing to redefine the schema. But it's a tradeoff -- having no schema means your app can't depend on documents/rows being similar throughout the collection.
I'm a critic of the Entity-Attribute-Value anti-pattern.
I've written about EAV problems in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Here's an SO answer where I list some problems with Entity-Attribute-Value: "Product table, many kinds of products, each product has many parameters."
Here's a blog I posted the other day with some more discussion of EAV problems: "EAV FAIL."
And be sure to read this blog "Bad CaRMa" about how attempting to make a fully flexible database nearly destroyed a company.
I would go for a Hybrid Entity-Attribute-Value model, so like Antony's reply, you have EAV tables, but you also have default columns (and class properties) which will always exist.
Here's a great article on what you're in for :)
As an additional comment, I knocked up a prototype for this approach using Linq2Sql in a few days, and it was a workable solution. Given that you've mentioned Entity Framework, I'd take a look at version 4 and their POCO support, since this would be a good way to inject a hybrid EAV model without polluting your EF schema.
On the surface, a schema-less or document-oriented database such as CouchDB or SimpleDB for the custom user data sounds ideal. But I guess that doesn't help much if you can't use anything but SQL and EF.
I'm not familiar with the Entity Framework, but I would lean towards the Entity-Attribute-Value (http://en.wikipedia.org/wiki/Entity-Attribute-Value_model) database model.
So, rather than creating tables and columns on the fly, your app would create attributes (or collections of attributes) and then your end users would complete the values.
But, as I said, I don't know what the Entity Framework is supposed to do for you, and it may not let you take this approach.
Not as a critical comment, but it may help save some of your time to point out that this is one of those Don Quixote Holy Grail type issues. There's an eternal quest for probably over 50 years to make a user-friendly database design interface.
The only quasi-successful ones that have gained any significant traction that I can think of are 1. Excel (and its predecessors), 2. Filemaker (the original, not its current flavor), and 3. (possibly, but doubtfully) Access. Note that the first two are limited to basically one table.
I'd be surprised if our collective conventional wisdom is going to help you break the barrier. But it would be wonderful.
Rather than re-implement sqlservers "CREATE TABLE" statement, which was done many years ago by a team of programmers who were probably better than you or I, why not work on exposing SQLSERVER in a limited way to the users -- let them create thier own schema in a limited way and leverage the power of SQLServer to do it properly.
I would just give them a copy of SQL Server Management Studio, and say, "go nuts!" Why reinvent a wheel within a wheel?
Check out this post you can do it but it's a lot of hard work :) If performance is not a concern an xml solution could work too though that is also alot of work.

Resources