What is the best practices when a team working with database [closed] - database

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What is the best practice for a team working on a same Database?
Should developers use their Local database? Or a shared development database instance?

In my experience a team should have (at least) one shared database for integraton.
During development each team member should have an independed database otherwise changes of database schema could hinder other team members. These instances could also be on a centralized server.

I can only talk about the way the team I'm in currently works, which fits our needs well enough:
We have one central data model script that updates any database automatically to the latest schema version. Developers check in changes to this script together with changes to the source code (single commit on the same repository). The nightly builds update a central database copy, followed by a batch of automated tests on that database, and the human QA team also uses this same database the next day for all their testing.
We don't allow schema changes on the central database instance any other way than via the integration builds. For developing the schema change script, changes are developed on a separate database instance, either on the central server, or locally (depending on personal preference).

I don't think it depends at all. Each environment should have it's own database instance. Personally, I would never reccommend that everyone on the team works on the same copy of the source, and I view the database code and instance the same way.
If you are having problems with missing database changes, this is a symptom of a different development process issue. It would be analagous to forgetting to add a code file to source control.
Jeff Atwood has a pretty good article on source controlling database code.
Different developers supposedly work on different issues - how do you avoid stepping on other people's toes while unit testing?
I would absolutely advocate an integration/test environment, which is updated via a Continuous Integration process. This environment often serves as a litmus test for your deployment procedure as well.

At Redgate we'd recommend that each developer is given their own instance, as sandboxing ensures that developers don't tread on each other's toes. However, there are pros and cons with both models.
In our experience talking to database developers, roughly half of database development is performed on a shared environment, and half on a dedicated per-developer environment.

From experience, a shared database is better. Our code used to break all the time because someone added a column on their local database, then uploaded their new source code to SVN. Everyone else's implementation would then break until they figured out what had changed in the database.
I would have a shared database for development. We had one or two dev databases too for miscellaneous testing.

Having seperate database instances for a developer help them work in isolation however if we are a large team and working on multiple things at the sametime then it is good to have a shared enviornment too so that all the changes that needs to be delivered at the same time are not breaking each other and their dependecies could also be figured out correctly. This also helps in identifying where any application may break due to a change. So it is better to have a shared enviornment too along with some isolated enviornment replica copy of the development enviornment just in case we might want to test something big and complicated that is very critical and need to work anyhow.
But the only problem is to keep multiple enviornment in sync with each other as any missing change can be fatal.

I know people here are speaking in terms of their past experience but this is not such an easy question to answer universally. There are pros and cons to each approach and should be addressed based on the type of application and team you're working with.
If you don't know which route to go .. I would suggest going with shared development database first to see if you encounter any problems. If this solution works, it should be the preferred method since it eliminates the "integration" step. However, depending on the type of team and environment needs, you may need to adapt.

Related

creating database for each new company [duplicate]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.
Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

multi-tenant database architecture [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am building a SAAS application and we are discussing about one database per client vs shared databases. I have read a lot, incluisve some topics here at SO but I have still many doubts.
Our plataform should be highly customizable by each client. (they should be able to have custom tables and add custom fields to existing tables).
The multiple database aproach seems great in this case.
The problem is. should my "users" table be in the master database or in each client database?.
A user might have one or more organizations, so it would be present in multiple databases.
Also, what about generic tables like countries table, etc?
It makes sense to be in the master database. But I have many tables with a created_by field which have a foreign key to the user. Also have some permission related tables by client.
I would loose the power of foreign keys if multiple databases, which means more queries to the database. I know I can use cross-join between databases if they are in the same server but then i loose scalability. (I might need to have multiple database servers in future).
I have tought about federated tables. Not sure about performance.
The technologies I am using are php and symfony 2 framework and mysql for the database.
Also, I am afraid about the maintenance of such a system. We could create some scripts to automate the schema changes in all databases, but if we have 10k clients that would mean 10k databases.
What is your opiniion about this?
The main caracteristic of my app should be flexibility so if a client needs something more specific than the base plataform doesnt have, it should be possible to do it for him.
Some classic problems here. Have you ever been to http://highscalability.com/? Some good case studies there.
From personal experience if you try to share clients on one server, you will find that a very successful/active user will take up all the resources of the machine over time. We had one client in a SAAS that destroyed a shared server and we had to move him somewhere else.
I would rip out global enumerations into a service. You can make one central database for things like list of countries, list of states, etc. and put it behind a web service layer. Also in that database you can have user management/managing what server belongs to what user etc. You can make a management portal that reads/writes to this database for managing your user base.
If I was doing a SAAS again, I would start small and wait for the pain to hit. What you really want are good tools to address the scaling issues when they happen. Have some scripts ready to do rolling schema changes across servers (no way to avoid this once you have more than one server). Have scripts to take down machines while you are modifying the schema. Have scripts to migrate a user from a shared server to a dedicated one.
Consider setting up replication from a central database. This would pump down global information that each user partition/database would need without you having to write a lot of code.
But the biggest piece of advice I've seen - and experienced first hand - don't try too hard to build the next Facebook for scale. Start simple and see what actually happens before worrying about major scalability issues. You might be surprised as the user base grows what scales well and what does not.

Using a common database for collaborative development

Some of the people in my project seem to think that using a common development database with everyone connecting to it is the best thing. I think that it isn't and each developer having his own database (with periodic updated data dumps) is the best. Am I right or wrong? Have you encountered any problems in any of these approaches?
Disk space and CPU should be cheap enough that every developer can run their own instance of the database, with an automated build under version control. This is needed to allow developers to be bold in hacking on the database, in isolation from any other developer's concurrent hacking.
The caveat being, of course, that any changes they make to their private instance are useless to anyone else unless it can be automatically applied during the build process. So there needs to be a firm policy that application code can't depend on any database state unless that state is represented by version-controlled, unit-tested changes to the DDL.
For an excellent guide on the theory and practice of treating the database definition as another part of the project code, and coordinating changes and refactorings, see Refactoring Databases: Evolutionary Database Design by Scott W. Ambler and Pramod Sadalage.
I like having my own copy of the database for development, because it gives you the flexibility to rapidly change things without worrying how it will impact others.
However, if all the developers are hacking away on their own copy of the database, it becomes more and more difficult to merge everyone's work together in the end.
I think you can get the best of both worlds by letting developers work on a local copy during day-to-day development, but each developer should probably merge their work into a common copy on a pretty regular basis. Writing a lot of unit tests helps too.
We share a single database amongst all our developer (20-odd) but we've got it structured so that everyone has their own tables.
You don't need a separate database per developer if you structure the application right. It should be configurable which database or table-prefix it uses anyway so you can easily move it between instances (unit test, system test, acceptance test, production, disaster recovery and so on).
The advantage to using a single database is that the cost of maintenance is amortized. You don't have your DBAs trying to handle a lot of databases (or, if you're a small-DB shop, you don't have every developer trying to maintain their own database when they're better utilized in developing).
Having a single point of Failure is not a good thing isn't it?
I prefer a single, shared database. But it's very dependent on the situation and the applications being developed.
What works for me may not work for you. Go with your gut.
If you are working with Hibernate or any hibernate-based platform you can configure your database to be created when you start your server (create-drop option). This is very useful when you are adding new attributes to your classes. If this is the case each developer must have his own copy of the DB.
If you are not changing the DB structure at all then you can use a single shared DB.
In this second case is not a must. I prefer to have my own DB where I can do whatever I want. On the other hand remember that some queries can take a lot of time and this will affect your whole team if you are sharing a DB.

Database development organisation

A question regarding a DB development project. The database already exist and is rather large (several TBs).
What do you use for version control in DB development?
How do you control concurrent changes to the data model by different teams
What is your approach to the Unit Testing in the DB development
How do you deal with the sensitive data if the DB owners do not know what is sensitive? What is your approach to the data obfuscation? What are your obfuscation techniques?
How do you work on a large DB from several locations?
Please answer one or more of the items as you see fit. Each answer will be reviewed separately. Thank you very much!
EDIT:
A related question with good answers to the p.1 is here: How do you version your database schema?
For most of these, while the tools don't apply the general processes of code development do:
Maintain a development system separate from production with enough data to get useful performance metrics when testing a new model
This system has unit tests (SQL queries, commits, aborted atomic commits, etc) written and run against it prior to every release.
There are official 'releases'
The development database is the source control system itself - in other words the database is modeled and held in the database with sign-ins and rollbacks, etc. It's non-trivial, and doesn't solve every problem, but given the lack of good VCS for databases it works.
Roll-outs (after testing, integration, etc) consist of just the new database structure going to the production site - the modeling tables are not replicated there.
For 4, "How do you deal with the sensitive data if the DB owners do not know what is sensitive? What is your approach to the data obfuscation?"
"Sensitive until proven innocuous" is my mantra. Unless someone makes a case for not adequately protecting any data from visibility (either internal or external) then my default mode is to protect it.
Cases come up later on where we'll open data up for perfromance, reporting, etc reasons, but a documented business case with the appropriate signatures is required.

Where to put your code - Database vs. Application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been developing web/desktop applications for about 6 years now. During the course of my career, I have come across application that were heavily written in the database using stored procedures whereas a lot of application just had only a few basic stored procedures (to read, insert, edit and delete entity records) for each entity.
I have seen people argue saying that if you have paid for an enterprise database use its features extensively. Whereas a lot of "object oriented architects" told me its absolute crime to put anything more than necessary in the database and you should be able to drive the application using the methods on those classes?
Where do you think is the balance?
Thanks,
Krunal
I think it's a business logic vs. data logic thing. If there is logic that ensures the consistency of your data, put it in a stored procedure. Same for convenience functions for data retrieval/update.
Everything else should go into the code.
A friend of mine is developing a host of stored procedures for data analysis algorithms in bioinformatics. I think his approach is quite interesting, but not the right way in the long run. My main objections are maintainability and lacking adaptability.
I'm in the object oriented architects camp. It's not necessarily a crime to put code in the database, as long as you understand the caveats that go along with that. Here are some:
It's not debuggable
It's not subject to source control
Permissions on your two sets of code will be different
It will make it more difficult to track where an error in the data came from if you're accessing info in the database from both places
Anything that relates to Referential Integrity or Consistency should be in the database as a bare minimum. If it's in your application and someone wants to write an application against the database they are going to have to duplicate your code in their code to ensure that the data remains consistent.
PLSQL for Oracle is a pretty good language for accessing the database and it can also give performance improvements. Your application can also be much 'neater' as it can treat the database stored procedures as a 'black box'.
The sprocs themselves can also be tuned and modified without you having to go near your compiled application, this is also useful if the supplier of your application has gone out of business or is unavailable.
I'm not advocating 'everything' should be in database, far from it. Treat each case seperately and logically and you will see which makes more sense, put it in the app or put it in the database.
I'm coming from almost the same background and have heard the same arguments. I do understand that there are very valid reasons to put logic into the database. However, it depends on the type of application and the way it handles data which approach you should choose.
In my experience, a typical data entry app like some customer (or xyz) management will massively benefit from using an ORM layer as there are not so many different views at the data and you can reduce the boilerplate CRUD code to a minimum.
On the other hand, assume you have an application with a lot of concurrency and calculations that span a lot of tables and that has a fine-grained column-level security concept with locking and so on, you're probably better off doing stuff like that directly in the database.
As mentioned before, it also depends on the variety of views you anticipate for your data. If there are many different combinations of columns and tables that need to be presented to the user, you may also be better off just handing back different result sets rather than map your objects one-by-one to another representation.
After all, the database is good at dealing with sets, whereas OO code is good at dealing with single entities.
Reading these answers, I'm quite confused by the lack of understanding of database programming. I am an Oracle Pl/sql developer, we source control for every bit of code that goes into the database. Many of the IDEs provide addins for most of the major source control products. From ClearCase to SourceSafe. The Oracle tools we use allow us to debug the code, so debugging isn't an issue. The issue is more of logic and accessibility.
As a manager of support for about 5000 users, the less places i have to look for the logic, the better. If I want to make sure the logic is applied for ALL applications that use the data , even business logic, i put it in the DB. If the logic is different depending on the application, they can be responsible for it.
#DannySmurf:
It's not debuggable
Depending on your server, yes, they are debuggable. This provides an example for SQL Server 2000. I'm guessing the newer ones also have this. However, the free MySQL server does not have this (as far as I know).
It's not subject to source control
Yes, it is. Kind of. Database backups should include stored procedures. Those backup files might or might not be in your version control repository. But either way, you have backups of your stored procedures.
My personal preference is to try and keep as much logic and configuration out of the database as possible. I am heavily dependent on Spring and Hibernate these days so that makes it a lot easier. I tend to use Hibernate named queries instead of stored procedures and the static configuration information in Spring application context XML files. Anything that needs to go into the database has to be loaded using a script and I keep those scripts in version control.
#Thomas Owens: (re source control) Yes, but that's not source control in the same sense that I can check in a .cs file (or .cpp file or whatever) and go and pick out any revision I want. To do that with database code requires a potentially-significant amount of effort to either retrieve the procedure from the database and transfer it to somewhere in the source tree, or to do a database backup every time a minor change is made. In either case (and regardless of the amount of effort), it's not intuitive; and for many shops, it's not a good enough solution either. There is also the potential here for developers who may not be as studious at that as others to forget to retrieve and check in a revision. It's technically possible to put ANYTHING in source control; the disconnect here is what I would take issue with.
(re debuggable) Fair enough, though that doesn't provide much integration with the rest of the application (where the majority of the code could live). That may or may not be important.
Well, if you care about the consistency of your data, there are reasons to implement code within the database. As others have said, placing code (and/or RI/constraints) inside the database acts to enforce business logic, close to the data itself. And, it provides a common, encapsulated interface, so that your new developer doesn't accidentally create orphan records or inconsistent data.
Well, this one is difficult. As a programmer, you'll want to avoid TSQL and such "Database languages" as much as possible, because they are horrendous, difficult to debug, not extensible and there's nothing you can do with them that you won't be able to do using code on your application.
The only reasons I see for writing stored procedures are:
Your database isn't great (think how SQL Server doesn't implement LIMIT and you have to work around that using a procedure.
You want to be able to change a behaviour by changing code in just one place without re-deploying your client applications.
The client machines have big calculation-power constraints (think small embedded devices).
For most applications though, you should try to keep your code in the application where you can debug it, keep it under version control and fix it using all the tools provided to you by your language.

Resources