Fresh build a Application using multiple Databases - database

I'm a student and I have a question about architecture.
Is it common to use multiple database connections in a Java application when being in the first stage of the developing process?
Best regards ,
Erik Student

Hello Erik and welcome to StackOverflow.
To answer your question:
That very much depends on the architecture/usecases of the application. A couple of examples that could motivate the use of multiple database connections are;
Needed data is stored/owned on different locations
Microservice architecture (https://smartbear.com/learn/api-design/what-are-microservices/)
Parts of data are used by multiple applications (splitting into multiple databases for load distribution)
Do note that the distribution of data comes with some disadvantages, such as syncing data between databases (foreign keys could be hard to manage), and data mismatch between applications/application states.
Further, you can always start with a single database and later split them, as long as your data schema allows some flexibility between tables, for example, don't mash all data in a single table.
To give a definite answer to your question we would need to know more about the environment/architecture of the application.
I hope this helps you somewhat :)

Related

When is a flat DB design acceptable

When is it ok to use a flat DB table design nowadays. Ever? What I mean is when is it ok to abandon the wisdom of relational database design and revert back a flat table structure that incorporates no links, adding extra columns to add more data, when we should be creating a key to another table to store multiple rows.
I'm working on some ideas to discuss with a product management team. When I initially asked the question "Why are all these tables flat in nature" I was told that
"Read centric databases display better performance with a flat table structure."
I struggle with this explanation b/c a flat design present so many barriers to progress down the road.
Thoughts?
"Read centric databases display better performance with a flat table structure." This statement says table won't/rarely be used to insert/update/delete operations. In that case table must be properly indexed to get good performance. Since there won't be any kind of joins so table would be using lot of filters in where clause hence indexing is really important to be used appropriately.
This kind of scenario is usually used in data warehouses. When we designs warehouses, we usually eliminates primary/foreign keys and uses business primary keys. This is because of huge database in wareshouse.
Never.
Whatever problem you think you are going to solve by ignoring relational database theory, you will only create many more intractable problems. Furthermore, the original problem that you attempt to avoid by ignoring relational theory will invariably be based on a misconception anyway.
Short answer: Almost always!
Your website almost never needs conventional database!
After 20 years of working as an IT admin with big and small projects I can say with confidence that over 90% of todays websites do not need DataBase AT ALL.
It's just another layer of obfuscation that most companies and people can do without.
Face the facts people. Most websites out there don't get a single hit in a day so talking about DataBase performance is quite silly when it comes to HUGE majority of websites today (2019).
That means that over 90% of these sites could and should switch to some flat file CMS/CMR like PageKit, Grav or Bludit (It's my personal favorite because of its minimalistic approach. It disdains flatDB and uses ordinary folders to contain articles in HTML files.)
I never did figure out why CMS leaders like WordPress and Joomla insist on complicating their default setup by forcing their users to use DataBase connection and configuration that's often the reason the site malfunctions. If and only when site actually needs some type of DB like for instance if it has many user accounts then DB is warranted. Still, most websites have only a hand-full user accounts.
Many times we see some site down because the DataBase engine is down or can't handle so many simultaneous connections while Apache or NginX web-servers are still up and running.
Don't just follow others blindly. Time to be brave and lead.

How to design a DB for several projects

Im wondering what will be the best way to organize my DB. Let me explain:
Im starting a new "big" project. This big project will be composed by few litle ones. In general the litle projects are not related to each other, they are just features of the big one.
One thing that all the projects have in common is the users that are going to use it.
So my questions are:
Should i create different DB for each one of the litle projects
(currently each project will contain 4-5 tables)
How to deal with the users? Should I create one DB for all the users
or should i
duplicate the users table in every DB? Have in mind that the
information about the users is used a lot in every litle project,
it's NOT only for identification purposes.
Thanks in advance for your advice.
This greatly depends on the database you choose to use.
If these "sub-projects" are designed to work as one coherent unit, then I strongly recommend you keep it all in the same database. One backup, one restore, one unit.
For organizational purposes, if you are using a database which supports it, select a different Schema per project. PostgreSQL and SQL Server are two databases (among others) which support this effortlessly.
In the case of a database like MySQL, I recommend you pick a short prefix for each subproject and prefix all tables accordingly. "P1_Customer" for example.
Shared data would go in it's own schema or prefix, like Global or something like that.
Actually, this was one of the many reasons we switched our main database from MySQL to PostgreSQL. We've been heavy users of both, and I really appreciate the features that PostgreSQL offers. SQL Server, if you are in a windows environment, is a great database IMO as well.
If the little projects are "features of the big one" then I don't see a reason why you wouldn't want just one user table for the main project. The way you setup the question makes this seem true "If there is a user A in little project 1, then there must be a user A in the 'big' project." If that is true, you should likely have the users in the big db instead of doing duplication unless you have more qualifying details.
i think the proper answer is 'it depends'.
Starting your organization down the path of single centralized system is good on many levels. I think in general i would recommend this.
however:
if you are going to have dramatically different development schedules, or dramatically different user experiences with the various sub projects, then you may be better off keeping them separate.
I'd have a look at OpenID or some other single sign-on protocol depending on the nature of your application. OpenID includes a mechanism called "attribute exchange", which allows applications to retrieve profile information from the OpenID provider.
This allows you to create a central user profile repository, with an authentication scheme, and have your individual apps query that repository for profile information.
The question as to how to design your database is hard to answer without more information. In most architectures, "features" within an application tend to be closely linked - "users" are related to "accounts" are related to "organisations" etc.
I'd recommend looking at the foreign key relationships to answer this question. If you have lots of foreign keys, build a single database for all tables. If you have "clusters" of foreign keys, and you want to have a different life cycle for each application (assuming the clusters map neatly to the applications), consider separate databases.
By "life cycle", I mean mostly the development lifecycle - app 1 might deploy weekly, app 2 monthly, app 3 once only and then be frozen.

Basic Database Question?

I am intrested to know a little bit more about databases then i currently know. I know how to setup a database backend for any webapp that i happen to be creating but that is all. For example if i was creating three different apps i would simply create three different databases and then configure each database for the particular app. This is all simple knowledge and i would now like to have a deeper understanding of how databases actually work.
Lets say that I developed an application for example that needed lot of space and processing power.This database would then have to be spread over numerous machines. How exactly would a database be spread across numerous machines and still be able to write records and then retreieve them. Would each table get their own machine and what software is needed to make sure that the different machines have all performed their transactions successfully.
As you can see i am quite a database ignoramus lol.
Any help in clearing this up would be greatly appreciated.
I don't know what RDBMS you're using but I have two book suggestions.
For theory (which should come first, in my opinion): Database in Depth: Relational Theory for Practitioners
For implementation: High Performance MySQL: Optimization, Backups, Replication, and More
I own both these books and they are both pretty great, especially the first one.
That's quite a broad topic... You might want to start with Multi-master replication, High-availability clustering and Massively parallel processing.
If you want to know about how to keep databases running with ever increasing load, then it's not a basic question. Several well known web companies are struggling to find the right way to make their database scalable.
Using memcached to cache database information is one way to decrease load on your database if your application is read-intensive. If you application is write-intensive then may be you would want to consider using a NOSQL datastore like MongoDB or Redis.
Database Design for Mere Mortals
This is the best book about the subject if you don't have any experience with databases. It's got historical background and practical examples. Most books often skip the historical stuff because they assume you know what a db is, or it doesn't matter, and jump right to the practical. This book gives you the complete picture.

Database tables - how many database?

How many databases are needed for a social website? I have my tech team working on developing a social site but all their tables are in 1 database. I wanted to create separate table sets for user data, temporary tables, etc and thinking maybe have one separate database only for critical data, etc but I am not a tech person and now sure how this works? The site is going to be a local reviews website.
This is what happens when management tries to make tech decisions...
The simple answer, as always, is as few as possible.
The slightly more complicated answer is that once your begin to push the limits of your server and begin to think about multiple servers with master/slave replication then your may want your frequent write tables separated from your seldom write tables which will lower the master-slave update requirements.
If you start using seperate databases you can also run into an with you backup / restore strategy. If you have 5 databases and backup all five, what happens when you need to restore one of them, do you then need to restore all five?
I would opt for the fewest number of databases.
The reason you would want to have multiple databases is for scaling-out to multiple machines. In the context of a "social application" where large volume / high availability is a concern. If you anticipate the need to scale out to multiple machines to handle high volumes then the breakout of tables should be those that logically need to stay together.
So, for example, maybe you want to keep tables related to a specific subject area (maybe status updates) together in one database and other tables that are related to a different subject area (let's say user's picture libraries) together in a different database.
There are logical and performance reasons to keep tables in separate physical or logical databases.
What is the reason that you want it in different databases?
You could just put all tables in one database without a problem, even with for example multiple installations of an open source package. In that case you can use table prefixes.
Unless you are developing a really BIG website, one database is the way to proceed (by the way, did you consider the possible issues that may raise when working with various databases?).
If you are worried about performance, you can always configure different tablespaces on several storage devices in order to improve timings.
If you are worried about security, just increase it (better passwords, no direct root login, no port forwarding, avoid tunneling, etc.)
I am not a tech person only doing the functional analysis but I own the project so I need to oversee the tech team. My reason to have multiple database is security and performance.
Since this is going to be a new startup, there is no money to invest into strong security or getting the database designed flawless. Plus there are currently no backup policies in place so:
1) I want to separate critical data like user password/basic profile info, then separate out user media (photos they upload on their profile) and then the user content. Then separate out the system content. Current design is to have to layers of tables: Master tables for entire system and module tables for each individual module.
2) Performance: There are a lot of modules being designed and this is a data intensive social site with lots of reporting / analytic being builtin so lots of read/writes. Maybe better to distribute load across database based on purpose?
Since there isn't much funding hence I want to get it right the first time with my investment so the database can scale & work well until revenue comes in to actually invest in getting it right. Ofcourse that could be maybe 6 months away and say a million users away too.
Oh & there is plan to add staging/production mode also so seperate or same database?
You'll be fine sticking with using one database for now. Your developers can isolate/seperate application data by making use of database schema. Working with multiple databases can quickly become a journey through a world of pain and is to be avoided unless its absolutely crucial.

Database per application VS One big database for all applications [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm designing a few applications that will share 2 or 3 database tables and all of the other tables will be independent of each app. The shared databases contain mostly user information, and there might occur the case where other tables need to be shared, but that's my instinct speaking.
I'm leaning over the one database for all applications solution because I want to have referential integrity, and I won't have to keep the same information up to date in each of the databases, but I'm probably going to end with a database of 100+ tables where only groups of ten tables will have related information.
The database per application approach helps me keep everything more organized, but I don't know a way to keep the related tables in all databases up to date.
So, the basic question is: which of both approaches do you recommend?
Thanks,
Jorge Vargas.
Edit 1:
When I talk about not being able to have referential integrity, it's because there's no way to have foreign keys in tables when those tables are in different databases, and at least one of the tables per application will need a foreign key to one of the shared tables.
Edit 2:
Links to related questions:
SQL design around lack of cross-database foreign key references
Keeping referential integrity across multiple databases
How to salvage referential integrity with mutiple databases
Only the second one has an accepted answer. Still haven't decided what to do.
Answer:
I've decided to go with a database per application with cross-database references to a shared database, adding views to each database mimicking the tables in the shared database, and using NHibernate as my ORM. As the membership system I'll be using the asp.net one.
I'll also use triggers and logical deletes to try and keep to a minimum the number of ID's I'll have flying around livin' la vida loca without a parent. The development effort needed to keep databases synced is too much and the payoff is too little (as you all have pointed out). So, I'd rather fight my way through orphaned records.
Since using an ORM and Views was first suggested by svinto, he gets the correct answer.
Thanks to all for helping me out with this tough decision.
Neither way looks ideal
I think you should consider not making references in database layer for cross-application relations, and make them in application layer. That would allow you to split it to one database per app.
I'm working on one app with 100+ tables. I have them in one database, and are separated by prefixes - each table has prefix for module it belongs to. Then i have built a layer on top of database functions to use this custom groups. I'm also building data administrator, which takes advantage of this table groups and makes editing data very easy.
It depends and your options are a bit different depending on the database and frameworks you're using. I'd recommend using some sort of ORM and that way you don't need to bother that much. Anyways you could probably put each app in it's own schema in the database and then either reference the shared tables by schemaname.tablename or create views in each application schema that's just a SELECT * FROM schemaname.tablename and then code against that view.
There are no hard and fast rules to choose one over the other.
Multiple databases provide modularity. As far as sync-ing across multiple databases are concerned, one can use the concept of linked servers and views thereof and can gain the advantages of integrated database (unified access) as well.
Also, keeping multiple databases can help better management of security, data, backup & restore, replication, scaling out etc!
My 2cents.
THat does not sound like "a lot of applications" at all, but like "one application system with different executables". Naturally they can share one database. Make smart usage of Schemata to isolate the different funcational areas within one database.
One database for all application in my opinion .Data would be stored once no repitation.
With the other approach you would end up replicating and in my opinion when you start replicating it will bring its own headache and data would go out of sync too
The most appropriate approach from scalability and maintenence point of view would be to make the "shared/common" tables subset self-sufficient and put it to "commons" database, for all others have 1 db per application of per logical scope (business logic determined) and maintain this structure always
This will ease the planning and execution commissioning/decommissioning/relocation/maintenence procedures of your software (you will know exactly which two affected DBs (commons+app_specific) are involved if you know which app you are going to touch and vice versa.
At our business, we went with a separate database per application, with cross database references for the small amount of shared information and an occasional linked server. This has worked pretty well with a development, staging, build and production environments.
For users, our entire user base is on windows. We use Active Directory to manage the users with application references to groups, so that the apps don't have to manage users, which is nice. We did not centralize the group management, that is each application has tables for groups and security which is not so nice but works.
I would recommend, that if your applications are really different, to have a database per application. Looking back, the central shared database for users sounds workable as well.
You can use triggers for cross database referential integrity:
Create a linked server to the server that holds the database that you want to reference. Then use 4-part naming to reference the table in the remote database that holds the reference data. Then put this in the insert and update triggers on the table.
EXAMPLE(assumes single row inserts and updates):
DECLARE #ref (datatype appropriate to your field)
SELECT #ref = refField FROM inserted
IF NOT EXISTS (SELECT *
FROM referenceserver.refDB.dbo.refTable
WHERE refField = #ref)
BEGIN
RAISERROR(...)
ROLLBACK TRAN
END
To do multi row inserts and updates you can join the tables on the reference field but it can be very slow.
I think the answer to this question depends entirely on your non functional requirements. If you are designing a application that will one day need to be deployed across 100's of nodes then you need to design your database so that if need be it could be horizontally scaled. If on the other hand this application is to be used by a hand full of users and may have a short shelf life then you approach will be different. I have recently listened to a pod cast of how EBAY's architecture is set-up, http://www.se-radio.net/podcast/2008-09/episode-109-ebay039s-architecture-principles-randy-shoup, and they have a database per application stream and they use sharding to split tables across physical nodes. Now their non-functional requirements are that the system is available 24/7, is fast, can support thousands of users and that is does not lose any important data. EBAY make millions of pounds and so can support the effort that this takes to develop and maintain.
Anyway this does not answer your question:) my personnel opinion would be to make sure your non-functional requirements have been documented and signed off by someone. That way you can decide on the best Architecture. I would be tempted to have each application using its own database and a central database for shared data. And I would try to minimise the dependencies between them, which I'm sure is not easy or you would have done it:), but I would also try to steer clear of having to produce some sort of middle ware software to keep tables in sync as this could create a headaches for you.
At the end of the day you need to get your system up and running and the guys with the pointy hair wont give a monkeys chuff about how cool your design is.
We went for splitting the database down, and having one common database for all the shared tables. Due to them all being on the save SQL Server instance it didn't affect the cost of running queries across multiple database.
The key in replication for us was that whole server was on a Virtual Machine (VM), so for replication to create Dev/Test environments, IT Support would just create a copy of that image and restore additional copies when required.

Resources