Related
Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.
I'd like to create an in-house solution to store marketing segment, list, campaign, and communication data. Right now nothing is centralized/standardized. Data is located on a variety of SQL servers, Access databases, and Excel spreadsheets. It's been a real pain when it comes to reporting/tracking.
I'm in a Microsoft SQL Server environment and have access to:
Microsoft Access
Microsoft SQL Server Management Studio
Microsoft Business Intelligence Development Studio
Security and compliance is pretty restrictive in my environment. Purchasing a third party software package doesn't appear to be an option. I may have the possibility of having a SQL server sandbox environment created for my use.
I'm curious what suggestions you would recommend and why. I need to think about all aspects including existing data retrieval/parsing (some on a continuing basis), data import into the new marketing datamart, and reporting. Some kind of GUI may be required as there isn't currently one for tracking/categorizing much of the data. One other individual may need access to help with regular imports to help spread workload.
Thanks.
Your requirements are incomplete:
existing data retrieval/parsing (some on a continuing basis),
data import into the new marketing datamart, and reporting.
Some kind of GUI may be required as there isn't currently one for tracking/categorizing much of the data.
One other individual may need access to help with regular imports to help spread workload.
Here's your question: "I'm curious what suggestions you would recommend and why"
Here's the answer.
Who will use this? Exactly who? Call every single one of them and talk to them about what they do.
What is the business value? "solution to store marketing segment, list, campaign, and communication data" is a bad idea. No one wants a "solution" -- they want to get their jobs done. Few people have "problems" that need "solutions". They are already perfectly able to do their job. The best you can do is make them more efficient. Do they care about their personal efficiency? I doubt it.
Who has the problem? What problem do they have?
Think of a "Bacon and Eggs Breakfast". The chicken lays an egg and walks away. The pig, however, is totally committed to the bacon.
Find the pigs and chickens. Your data is not going to identify your actors and your business problem. Find the people who have the problem. Find out how big and costly the problem is. Be absolutely sure, you've found the real problem that is costing someone real money. Cozy up to the person who's losing the most money and make sure they want their problem solved. They are the pig -- they can be totally committed.
Eventually, you'll want to build one central SQL/Server database and get rid of Excel Spreadsheets and MS-Access databases. You might want to have an MS-Access front-end which provides nice applications that use SQL/Server.
If you are actually talking about a data warehouse, then you must next read books by Ralph Kimball. [It's not clear, BTW, that a data warehouse is even relevant. With no problem to solve and no user who has the problem, a data warehouse is just as bad an idea as a web services framework or a new Bentley for me (Black and Silver, thank you.)
If you want to build a data warehouse, your "existing data retrieval/parsing" and "data import into the new marketing datamart" will called ETL.
Your "and reporting" will be moved from an "oh, by the way" to the central most important feature of whatever it is you're doing.
Your "Some kind of GUI" will go away. Data warehousing is not much of a GUI thing. Reporting is as close as you get. Maybe you'll have to create some Master Data Management tools, but even then, those are more rules than interactions.
"One other individual may need access" Really? The end users are what? Chopped liver? They need query access, too, or they won't see any of your data.
You indicate that a purchase of third party software is not an option, however, you should consider that the purchase may in fact be cheaper than the in house development effort, and may bring you to a working system much faster, with less in house skills required.
If I am building a CRM web application to sell as a membership service, what is the best method to design and deploy the database?
Do I have 1 database that houses 100s of records per table or deploy multiple databases for different clients?
Is it really an issue to use a single database since I believe sites like Flickr use them?
Multiple clients is called "multi-tenant". See for example this article "Multi-Tenant Data Architecture" from Microsoft.
In a situation like a CRM system, you will probably need to have separate instances of your database for each customer.
I say this because if you'd like larger clients, most companies have security policies in place regarding customer data. If you store their customer data in the same database as another customer, you're running the risk of exposing one companies confidential data to another company (a competitor, etc.).
Sites like Flickr don't have to worry about this as much since the majority of us out on the Interwebs don't have such strict policies regarding our personal data.
Long term it is easiest to maintain one database with multiple clients' data in it. Think about deployment, backup, etc. However, this doesn't keep you from having several instances of this database, each containing a subset of the full client dataset. I'd recommend to grow the number of databases after you have established the usefulness/desirability of your product. Having complex infrastructure is not necessary if you have no traffic....
So, I'd just put a client id in the relevant tables and smile when client 4 comes in and the extent of your new deployment is one insert statement.
Imagine you sell an application that stores its data in a Microsoft Sql Server database. Some customers are large companies with existing Sql installations and staff to maintain them. Other customers are small companies who get the SQLEXPRESS version installed by a setup program.
The database will soon contain a lot of important data and will need backing up. Whose responsibility is this?
Should the application contain a UI for backing up the database and warn when the backup has not been run?
Or should the application just include instructions for backup in its documentation?
Or is this a problem for the customers to solve and not the programmers?
In most cases I'd expect the customer to handle backups without involving the programmers. There is, of course, nothing wrong if the customer asks you as a programmer to build backup-reminders into the application.
In some cases where the customer has little or no understanding of the technology they are using I also think it is our responsibility as system developers to inform them of what they are expected to do in terms of maintenance.
Even with professional customers I fairly often specify minimum backup requirements in documentation, simply because you get what you ask for.
If you are building software that installs SQLExpress, then you should provide the backup functionality. By default, SQLExpress doesn't come with Management Studio, and I would consider it a major usability issue to install Management Studio as part of your app (or ask the customer to do so).
The BACKUP DATABASE command is simple to execute from your application. There's really no reason not to provide this functionality.
As for customers who have an IT staff, they can still handle the backups themselves using their existing process.
Everybody wins.
It's the customer's responsibility. However,...
I believe you have three choices:
Mention the database should be backed up.
Give suggestions on how to backup the data.
Offer your services to maintain the backup procedures.
One or all three of these can be used. It all depends on what the buyer and seller are comfortable with.
Backups
From my experience a lot depends on how the database is installed and packaged with the application. If you sell an application to a customer with the database layer implementation details hidden or abstracted, then the typical customer expectation is that a backup solution is provided as part of the software. This expectation is there for a variety of reasons.
Off the top of my head, one of which is that the IT department supporting the application cannot be confident in how the backup of the db should happen for the application. Sometimes, a DB dump with some vendors can cause locking issues that interrupt normal application operation. I'm sure you can imagine other complaints an IT department might have for an application that bundles a DB.
As for installs that connect to a existing DB, I think it is a reasonable expectation that the DB admins for those databases handle backups. However, it is very important that you - the application provider document what needs to be backed up. Are there sequences and indexes that need to have their precise order persevered? Are there terabytes of data created that may not be important to your customer if lost? What about the state of the application during the backup (as mentioned above)? Does the application need to be shutdown?
Restores
More problematic is restores. What if a partial restore of data is needed? How could this corrupt your data store? Does your application store data anywhere else (files, network) that can be put in an unreliable/broken state as a result of a historical backup?
Anyways, a long story short - when I'm a customer, I feel a lot better knowing that the application vendor has QAed the backup/restore process and provides support. Actually, the was a requirement of a large customer of my current project. They were more than capable of managing a database, rather they wanted us to take responsibility for our own data and guarantee that the process would be flawless.
In the end, your mileage may vary depending on the implementation, requirements and industry expectations of your specific application.
My personal opinion is that instructions for back up of the data is really important. Most customers should be able to handle making back-ups themselves without needing a GUI, and a GUI will cost some time and money to develop.
Providing instructions for back up will help make sure that they know what they need to do, and know that it is their responsibility. Good instructions will also (hopefully) reduce the amount of support they will require :) Most non-technical users I know these days at least know they need to back up their important documents to CD's, with some instructions they should be able to back up their database on their own.
I think a GUI for backups is only essential in special occasions, where the users will not be technically proficient at all, data security is important, and the users wont have access to (or want to spend money on) IT staff.
Having the application run a backup solves the problem of "How do I get my data back that I accidentally deleted?", but it does not solve the problem of a crashed hard drive. Your app isn't going to make sure the backup goes to tape and that the tape gets properly stored off-site. Be part of the best solution that forces them to take some responsibility for their data. Don't lead them into a false sense of security because they received a message box indicating the backup is complete.
If you're providing SQL Server Express it is important to discuss the requirement of backing up with your customers (of course only if that is possible).
A number of people on here have suggested that you let the users IT deal with the problem. This is of course completely valid however it does have the inherent problem that a worrying number of IT support staff have no idea about database admin and even less idea how to implement a nice back up strategy.
Consider carefully the needs of your individual customers, and perhaps add automatic backups as a bolt-on, or for free if you're feeling generous.
After all this is SQL Express, many customers will not expect to have to know anything about database backups.
My previous job involved maintenance and programming for a very large database with massive amounts of data. Users viewed this data primarily through an intranet web interface. Instead of having a table of user accounts, each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc., as well as permitting us to control access through the RDBMS itself instead of using our own application logic.
Is this a good setup, assuming you're not on the public intranet and dealing with potentially millions of (potentially malicious) users or something? Or is it always better to define your own means of handling user accounts, your own permissions, your own application security logic, and only hand out RDBMS accounts to power users with special needs?
I don't agree that using the database for user access control is as dangerous others are making it out to be. I come from the Oracle Forms Development realm, where this type of user access control is the norm. Just like any design decision, it has it's advantages and disadvantages.
One of the advantages is that I could control select/insert/update/delete privileges for EACH table from a single setting in the database. On one system we had 4 different applications (managed by different teams and in different languages) hitting the same database tables. We were able to declare that only users with the Manager role were able to insert/update/delete data in a specific table. If we didn't manage it through the database, then each application team would have to correctly implement (duplicate) that logic throughout their application. If one application got it wrong, then the other apps would suffer. Plus you would have duplicate code to manage if you ever wanted to change the permissions on a single resource.
Another advantage is that we did not need to worry about storing user passwords in a database table (and all the restrictions that come with it).
I don't agree that "Database user accounts are inherently more dangerous than anything in an account defined by your application". The privileges required to change database-specific privileges are normally MUCH tougher than the privileges required to update/delete a single row in a "PERSONS" table.
And "scaling" was not a problem because we assigned privileges to Oracle roles and then assigned roles to users. With a single Oracle statement we could change the privilege for millions of users (not that we had that many users).
Application authorization is not a trivial problem. Many custom solutions have holes that hackers can easily exploit. The big names like Oracle have put a lot of thought and code into providing a robust application authorization system. I agree that using Oracle security doesn't work for every application. But I wouldn't be so quick to dismiss it in favor of a custom solution.
Edit: I should clarify that despite anything in the OP, what you're doing is logically defining an application even if no code exists. Otherwise it's just a public database with all the dangers that entails by itself.
Maybe I'll get flamed to death for this post, but I think this is an extraordinarily dangerous anti-pattern in security and design terms.
A user object should be defined by the system it's running in. If you're actually defining these in another application (the database) you have a loss of control.
It makes no sense from a design point of view because if you wanted to extend those accounts with any kind of data at all (email address, employee number, MyTheme...) you're not going to be able to extend the DB user and you're going to need to build that users table anyway.
Database user accounts are inherently more dangerous than anything in an account defined by your application because they could be promoted, deleted, accessed or otherwise manipulated by not only the database and any passing DBA, but anything else connected to the database. You've exposed a critical system element as public.
Scaling is out of the question. Imagine an abstraction where you're going to have tens or hundreds of thousands of users. That's just not going to manageable as DB accounts, but as records in a table it's just data. The age old argument of "well there's onyl ever going to be X users" doesn't hold any water with me because I've seen very limited internal apps become publicly exposed when the business feels it's could add value to the customer or the company just got bought by a giant partner who now needs access. You must plan for reasonable extensibility.
You're not going to be able to share conn pooling, you're not going to be any more secure than if you just created a handful of e.g. role accounts, and you're not necessarily going to be able to affect mass changes when you need to, or backup effectively.
All in there seems to be numerous serious problems to me, and I imagine other more experienced SOers could list more.
I think generally. In your traditional database application they shouldnt be. For all the reason already given. In a traditional database application there is a business layer that handles all the security and this is because there is such a strong line between people who interact with the application, and people who interact with the database.
In this situation is is generally better to manage these users and roles yourself. You can decide what information you need to store about them, and what you log and audit. And most importantly you define access based on pure business rules rather than database rules. Its got nothing to do with which tables they access and everything to do with whether they can insert business action here. However these are not technical issues. These are design issues. If that is what you are required to control then it makes sense to manage your users yourself.
You have described a system where you allow users to query the database directly. In this case why not use DB accounts. They will do the job far better than you will if you attempt to analyse the querys that users write and vet them against some rules that you have designed. That to me sounds like a nightmare system to write and maintain.
Don't lock things down because you can. Explain to those in charge what the security implications are but dont attempt to prevent people from doing things because you can. Especially not when they are used to accessing the data directly.
Our job as developers is to enable people to do what they need to do. And in the situation you have described. Specifically connect to the database and query it with their own tools. Then I think that anything other than database accounts is either going to be insecure, or unneccasarily restrictive.
"each user account was a real first-class account in the RDBMS, which permitted them to connect with their own query tools, etc.,"
not a good idea if the RDBMS contains:
any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
credit card information or other customer credit info (POs, lines of credit etc)
personal information (ssn, dob, etc)
competitive, proprietary, or IP information
because when users can use their own non-managed query tools the company has no way of knowing or auditing what information was queried or where the query results were delivered.
oh and what #annakata said.
I would avoid giving any user database access. Later, when this starts causing problems, taking away their access becomes very dificult.
At the very least, give them access to a read-only replica of the database so they can't kill your whole company with a bad query.
A lot of database query tools are very advanced these days, and it can feel a real shame to reimplement the world just to add restrictions. And as long as the database user permissions are properly locked down it might be okay. However in many cases you can't do this, you should be exposing a high-level API to the database to insert objects over many tables properly, without the user needing specific training that they should "just add an address into that table there, why isn't it working?".
If they only want to use the data to generate reports in Excel, etc, then maybe you could use a reporting front end like BIRT instead.
So basically: if the users are knowledgeable about databases, and resources to implement a proper front-end are low, keep on doing this. However is the resource does come up, it is probably time to get people's requirements in for creating a simpler, task-oriented front-end for them.
This is, in a way, similar to: is sql server/AD good for anything
I don't think it's a bad idea to throw your security model, at least a basic one, in the database itself. You can add restrictions in the application layer for cosmetics, but whichever account the user is accessing the database with, be it based on the application or the user, it's best if that account is restricted to only the operations the user is allowed.
I don't speak for all apps, but there are a large number I have seen where capturing the password is as simple as opening the code in notepad, using an included dll to decrypt the configuration file, or finding a backup file (e.g. web.config.bak in asp.net) that can be accessed from the browser.
*not a good idea if the RDBMS contains:
* any information covered by HIPAA or Sarbanes-Oxley or The Official Secrets Act (UK)
* credit card information or other customer credit info (POs, lines of credit etc)
* personal information (ssn, dob, etc)
* competitive, proprietary, or IP information*
Not true, one can perfectly manage which data a database user can see and which data it can modify. A database (at least Oracle) can also audit all activities, including selects. To have thousands of database users is also perfectly normal.
It is more difficult to build good secure applications because you have to program this security, a database offers this security and you can configure it in a declarative way, no code required.
I know, I am replying to a very old post, but recently came across same situation in my current project. I was also thinking on similar lines, whether "Application users be Database users?".
This is what I analysed:
Definitely it doesn't make sense to create that big number of application users on database(if your application is going to be used by many users).
Let's say you created X(huge number) of users on database. You are opening a clear gateway to your database.
Let's take a scenario for the solution:
There are two types of application users (Managers and Assistant). Both needs access to database for some transactions.
It's obvious you would create two roles, one for each type(Manager and Assistant) in database. But how about database user to connect from application. If you create one account per user then you would end up linearly creating the accounts on the database.
What I suggest:
Create one database account per Role. (Let's say Manager_Role_Account)
Let your application have business logic to map an application user with corresponding role.(User Tom with Manager role to Manager_Role_Account)
Use the database user(Manager_Role_Account) corresponding to identified role in #2 to connect to database and execute your query.
Hope this makes sense!
Updated: As I said, I came across similar situation in my project (with respect to Postgresql database at back end and a Java Web app at front end), I found something very useful called as Proxy Authentication.
This means that you can login to the database as one user but limit or extend your privileges based on the Proxy user.
I found very good links explaining the same.
For Postgresql below Choice of authentication approach for
financial app on PostgreSQL
For Oracle Proxy Authentication
Hope this helps!
It depends (like most things).
Having multiple database users negates connection pooling, since most libraries handle pooling based on connection strings and user accounts.
On the other hand, it's probably a more secure solution than anything you or I will do from scratch. It leaves security up to the OS and Database server, which I trust much more than myself. However, this is only the case if you go to the effort to configure the database permissions well. If you're using a bunch of OS/db users with the same permissions,it won't help much. You'll still get an audit trail, but that's about it.
All that said, I don't know that I'd feel comfortable letting normal users connect directly to the database with their own tools.
I think it's worth highlighting what other answers have touched upon:
A database can only define restrictions based on the data. Ie restrict select/insert/update/delete on particular tables or columns. I'm sure some databases can do somewhat cleverer things, but they'll never be able to implement business-rule based restrictions like an application can. What if a certain user is allowed to update a column only to certain values (say <1000) or only increase prices, or change either of two columns but not both?
I'd say unless you are absolutely sure you'll never need anything but table/column granularity, this is reason enough by itself.
This is not a good idea for any application where you store data for multiple users in the same table and you don't want one user to be able to read or modify another user's data. How would you restrict access in this case?