I have a database-driven site developed with MVC and Entity Framework code first. The database is rather large and contains all the data that I would need for an additional web application. What are the implications of setting up a new website with database first using the same existing database? What I am really trying to ask is whether it would be a bad idea to share a database between two web applications where both are querying and doing updates to the data. Will this slow down processing on the original site or possibly lock up data, etc.? Both sites would be running on the same machine...
TIA
If sharing of the same data between both application is important i.e. you want the data to be shared between applications - than you have to use the same database. It'll slow down processing, but if it's the requirement, then you have to.
There's nothing stopping you from having two applications accessing the database. They are built to have multiple connections with multiple people accessing them. So there aren't many risks involved. You probably won't even notice the speed difference.
The two biggest risks I can think of
if both applications edit a record in the database, the one that submitted data last will win unless you put business logic in place to prevent that from happening
if the database schema is updated, both applications need to be updated to reflect the new schema to let it access the new data, or edit the data successfully
Related
Imagine a large corp with dozens of companies, each with their own website and each website will have their own unique functional requirements
Most data on each website will be specific to that website
Each website can edit its own data
Some data will be shared across all websites
There will be a central CMS that is allowed to edit this data, but other websites can read and use that data
e.g. say you're planning the infrastructure for a company that owns multiple sub-companies that make different kinds of products, some in the same category (cereal, food), others in completely different categories (books, instruments). Some are marketing websites, some are for CRM, some are online stores
there are a list of regulatory requirements that affect all products
each company should manage the status of compliance of its own products to each requirement
when a new requirement surfaces, details regarding that requirement should only be entered once
How would the multiple databases be coordinated?
edit: added more info per Bob's suggestions
Thanks for the incredibly insightful questions!
compliance data is not shared, silo'd within each site
shared data is only on the one enterprise-wide database, they will mostly be "types of [thing]"
no conclusive list of instances where they'll be used but currently it'd be to populate CMS dropdowns for individual sites.
changes to shared data would occur a few times a year.
Ideally changes would be reflected within a few minutes, but an hour or so should be acceptable
very low volume in shared data.
All DBs will be new, decision on which DB is pending current investigation.
Sub-systems will expose REST api
Here are some ways I have seen this handled, you need to think about the implications of each structure based on the details of your particular business domain. All can work, but all have to be carefully set up if they are going to work.
One database for shared information and one for each client for client-specific information. Set up the overall application so that the first thing you put in the application on log in is the client and it connects to the correct client. People might have to also have a way to change the client if users will handled multiples.
Separate servers for each client if they completely need to be siloed. Database changes are by script (and in source control) and are applied to each server as need be. So the changes to the central database might have a job that runs to push any data changes to the other servers
All the data in one database, but making sure each table has a client_id so that the data is always filtered correctly by client. You can set up separate views by client, so that the users can only see the clients they are supposed to see. This only works if the data for each client is substantially in the same form.
And since you are in a regulatory environment, I strongly urge that you create an audit database that is updated by database triggers (never audit from the application, you will lose changes to the data) for each database.
I agree with Chris that, even after both the sets of questions, there is still a big set of possible solutions. For instance, if the databases were the same technology, and the shared data were stored in the same way in each one, you could do db-level replication from the central db to the others. Is it OK to have 2 separate dbs per application (one with shared stuff and one with not-shared?) - this would influence the kind of replication.
Or you could have a purely code solution, where clicking publish in a GUI that updates the central db calls a set of APIs that also update the other dbs. Or micro-services - updating the central db also creates a message on a shared queue, that is picked up by services that each look after a different db and apply the updates in whatever form makes sense for that db.
It depends on (among the things already mentioned) what your organisation's technology strategy is, what technology and skills you already have in-house, and so on.
So this is as much an architecture question as it is a db question.
I don't think this question is sufficiently clear to get a single answer. However there are a few possibilities.
In many cases, where you have shared data you want to have a single point of ownership of that information. It could be in a database, in an excel file (which can then be turned into csv and periodically loaded on all dbs), or some other form. The specifics depend on what is shared exactly.
Now in this case it sounds like you are going to have some sort of legal department in charge of some shared information and they will manage that data, which will then be shared to the other sites. This might be done with an application they manage which aggregates information from the other companies or it could be data which is pushed to their systems.
A final point:
Software is at its best when it facilitates human solutions to human problems, not when it tries to solve those problems directly. In these cases, you probably want a good human solution in place and then to look at what software can do to support that. A lot of the issues (who owns the information?) will already have been solved and you will be simply automating what is already done.
Are there any resources available that can guide someone on how to 'think' about the various components of a hosted / cloud solution before going ahead and starting to make a hosted application? If that made no sense, what I mean to ask is are there any guidance books/websites on what things need to be considered when making a cloud application?
I am attempting to make a hosted CRM-style software application that will serve many hundreds of customers. The application is powered by a SQL server database with many tables and a ColdFusion, HTML5, CSS, Javascript front-end. If I was installing this application and its components at each client site, then each installation is unique to that customer. But somehow I have to replicate this uniqueness in the cloud which is baffling me.
Only two things have come to mind so far:
The need for a unique database per customer in SQL server
The need to change DB connection strings per customer in the web application
My thought process has come to a block when I am trying to envisage how to design the application to serve so many different customers. Even though the application that all customers use will is the same (same DB tables, same front-end), the data that they store and retrieve will be specific to them. So I was thinking that surely each customer needs a separate database creating for them? Is it feasible to create a replica database for each customer? If I need to update some tables or add a new table, how would I do this for hundreds of different databases?
From the front-end I guess each unique customer log-in would change DB connection strings so that they can only access their database. Other than this I can't think of anything else that needs to change per customer basis.
When a new customer wants to sign up, it needs to be clear to me what I need to create for them to have access to the application. I guess this is ultimately what I need to think of but I'm stuck.
If anyone can suggest some things to think of or if there is a book or website on this kind of thing that someone could point me to I'd really be very thankful.
EDIT:
I was looking at an article about Salesforce.com and it says
"In order to ensure privacy of data for each user and give an effect of each having their own database, the data from different users are securely isolated from one another."
Anyone know how this is achieved or how it may be done?
Found some great information here. It is called multi-tenant database design and seems to be a common topic. Once I get the database designed then the application can sit nicely on top.
https://dba.stackexchange.com/questions/1043/what-problems-will-i-get-creating-a-database-per-customer
First please excuse me for my grammar mistakes.
Ok, this is what I already know :-),
I want to use EF and MVC 4, UI with angularJs, I need a Database per user \ group of users,
my application growth may come to 5000+ users, they all have also a shared resource which is a single
database, when the user search for something the results will come both from the shared resource
and from the user own database.
Performance is extremely important.
In my research I found that EF can connect to different databases but i couldn't find any proper way of doing so without writing tons of code.
Scenarios :
New user registers, the system builds a new database for him.
New user logs in, the system returns data from his database and the shared database.
New user logs in, BUT, the system database got upgraded, users db should too.
Now I know that there is no easy method to achieve all of my goals,
but can you please direct me to what suits me best?
Again sorry for my English!
Thank you! :-)
IMHO, we have worked in several SaaS Applications that have been using a shared database [central repository] that will contain all the user [tenant] data and that there will be an application database [tenant based] for every user group.
This will work with ease in EF and there will be no performance overheads. You should not be using cross database queries and instead focus on the optimization of the EF code that you may have in the data access layer and then you can have separate services that will handle the task of merging the user data from the shared and separate databases.
May be you should analyze the application and then find the data that may be non-frequently updated and cache them and get them rendered using a distributed cache like Appfabric.
With respect to the synchronization of the User db and the centralized database, in the service tier, you can get this job done by wrapping these calls in a .Net Transaction scope and then the preserve the atomicity.
Please post your understanding and any further clarifications in my reply.
I am currently designing a web application where I will have customers signing up as companies. Each company will have its own set of users. As I am designing this I am wondering which approach would work best. I see sites like fogbugz or basecamp which use subdomains. In cases with subdomains do you have a database instance per sub domain? I'm wondering if it is recommended to have a database instance per company or if I should have some kind of company table and manage the company and user data/credentials all from one database.
Which approach is best? Is there literature on this subject (i.e. any web or book)?
thanks in advance!
You have to weigh up your options, as some of this will be a matter of opinion and might not be feasible for your implementation.
That being said, I'd consider the single database approach, for these reasons:
Maintenance: when running a database per registered 'client', you will very easily reach a situation where any changes or upgrades you make to your app's schema have to be applied to every single database instance. This will get ridiculous, fast.
Convenience: You might want analytics and usage stats, or some way to administrate all these databases. Querying a single database is comparatively trivial to trying to aggregate the same query for all your databases. This isn't going to scale.
Scalability *: As mentioned in 2, you're going to require a special sort of aggregation to query things about your clients, and your app as a whole. The bigger your app gets, the more complex your querying. The other issue is, if one client uses the app a lot more than another, what will you be encouraged to optimise? Your app, the bigger client's database, or the smaller client's? Not forgetting anything you do change has to be copied to all databases.
Backups: You can backup one database easily, just by creating a dump and stashing it somewhere. Get a thousand clients and now you have to run 1000 database dumps, and name them well enough to be able to identify them if one single database corrupts. How will you even know if this happens? Database errors will be localised to that specific one, as opposed to your entire app.
UI: A user signs up or is invited to use your app, and belongs to one particular client. Are you going to save that user account to the client's database? If so, see scalability for the issue of working with that data when the user wants to change their password, or you want to email them. So, do you tell the user to let you know which database they're in so you can find them?
Simplification: You have a database per client and want to just use a single one. How do you merge them all together without significantly breaking things? There'll be primary key conflicts if you use auto incremented IDs; bookmarked URLs will break if you decide to just regenerate the keys; foreign keys across tables will no longer point to the right records. Your data integrity will go down the pan.
You mention 'white label' services that offer their product through custom subdomains. I'm not privy to how these work, but the subdomain is only a basic CNAME or A record in their DNS zonefile. The process of adding these can be automated, and the design of the application and a bit of server configuration can deal with linking these subdomains to the correct accounts and data. They're just URLs, so maybe on the backend, the app doesn't differentiate between:
http://client.example.com
http://example.com/client
Overall though, you may decide that all these problems are things you can and would prefer to deal with. Be warned, however, that by doing so you may be shooting yourself in the foot, and you can gain a lot more from crafting a well-designed single database schema and a well-abstracted front-end.
*#xQbert mentions the very real benefit of scalability with multiple databases. I've amended this answer to clarify that I was more concerned with other aspects.
Here's my problem. I built a web app, and naturally kept the data in a database which describes that app's domain. Afterwords, I built another web app for the same organization, and used a seperate database to describe that app's domain and store data... and naturally a couple more projects came up and for each app I've isolated it's data to a single database. Deveolpment wise, I think it's ok, as I can maintain changes to the data structure and data at the app's database.
Considering these apps belong to the same organization, there tends to be plenty of data replicated between them, like department names, job titles, shop names, etc. Most of these tables hold the same data, but are not exactly the same in each database, and are not always used by all of the apps. Changes to this data, though, needs to be changed at all the apps (sometimes in a diferent ways) creating a growing management "hassle".
So I've been think of a way to get some syncronization between the data. I want an easier management - update at one app (or a central app) and update all the databases as needed by each app - and also a better way to share data between apps (like maybe mash up data from differnt apps in a new app to alow specific analysis). Most of the data I'm refering to is used as contraints more than being core domain concept, describing the organization rather than describing a particular domain.
I'm looking for opinions on some ways to get this done.
My first idea was to grab comun data structures, like the department names' table i mentioned, and stick'em in a core database. Any updates to the data would be done at this database, through a dedicated web app, and I'd apply some sort of Observer or Publisher / Suscriber Pattern for these changes - on changes the app would notify observing apps (through there dedicated webservice) that the changes occured and allow for the app to grab the new data and use it as it needs. GUIDs could be user as a reference to identify the same data throughout the apps. Also, I could build web services for read and search operations that don't need to be in a specific app's database, but could be useful to it.
A second idea would be that each app manage it's own data, and the apps could observe one another. A change in one could notify others that share the same data structure that the change occurred. I could still use some GUIDs and even build services on any of the apps. I think this would also be less excessive in terms of duplication of data, but might be harder to manage as each app would eventually be coupled to other apps, and I would some how have to distribute responsabilities as to which app controls what information.
I'm really curious as to something of this genre of data distibuition and syncing would work and even be recomended. Opions and other ideas are more than welcome!
What you describe here is a typical case for a "Master Data Management" system. EAI vendors (Oracle, TIBCO, IBM) offer such products. They resemble your first solution, being centralised databases with synchronization processes, detecting changes in external data sources, grabbing the changes and synchronizing data out to other external databases. They also provide a user interface to change master data directly.
MDM software are expensive, but you can implement a custom solution which will be - at least initially - cheaper than purchasing one. Both of your solutions make technical sense but there is a difference in their manageability.
The first one is better, if you can dedicate a responsible person/organization to take care of it and the business owners of your services can agree on making changes via this new centralised system.
The second solution shares the responsibility between the service owners. The hard task here is to identify the owner of each type of information (business object).
I cannot advise a solution without a deeper knowledge of your systems and organizations, but I hope I could give some ideas.