Synchronizing multiple servers and local machines with the same data / source code - database

Background:
Building web app with a team of 2 developers. RESTful backend via Flask. Using Linux, Apache, Redis, and Postgres.
3 servers:
1 for production
1 for development
1 for UAT
4 databases:
1 for PROD/UAT server
1 for DEV server
1 for developer A on his local machine
1 for developer B on his local machine
2 local machines / developers
In addition to the 4 databases, the developers each have one additional database that serves testing. This testing database needs to be the exact same at all times between the two developers.
Developer B has his own fork of the data, sending pull requests to the master repo, which is worked on by developer A.
Problem:
We have no real protocols to easily transfer data between each of the databases. For example, the developers test databases are often different, which causes chaos. Moving data from DEV to UAT/PROD is done manually.
Developers work in different environments and on different forks. We use pull requests in github to transfer code to Developer A's main repo.
Question
What do you recommend as a solution to our database woes? Is there a better way to share data? Is there a better way for developer A and developer B to share their environment and source code?

I have been working on this problem for an environment of similar size this year. The technology is different, but in short:
2-3 developers
Each developer has a complete environment
Integration, UAT and production server pairs (all VMs).
Linux, Mongodb, Django and Angular
Code
After a few false starts we have settled on the feature branch approach and a good working proposition. See http://nvie.com/posts/a-successful-git-branching-model/. At the moment we have master for production and a development branch. Only one person works on a 'feature' (or we pair). Features should be short-lived. We might flip between multiple features when necessary. We can merge from one feature to another when we need to. It all comes out quite cleanly when we merge back into development. The overhead is minimal and we rarely step on each other's toes. When we do the diff makes it clear what we need to fix manually. A good GIT UI tool helps.
Database
You could use a shared database for development. We don't. Every developer has an independent environment. We may need collections, but usually from UAT or prod rather than each other. I have created a comprehensive admin page. On any environment we can export a collection set. There are mechanisms to send this export where needed. This is used to take prod data and place it in UAT for problem replication. We can then drop back to Integration and dev for repair. Similarly devs can share data using the same application admin page.
Release
Going up the chain from dev -> integration -> uat -> prod is also handled by the application admin page. Any system can export, but only up to the next stage. It is not automatically imported. The import is not automagic. The admin on the target environment is told that a release is available. They can import it from the admin page. We do the same for code and database collections.
It is useful to have it integrated rather than knowing which script to run. It could be a separate application if that worked batter in your environment.

Related

Using databases with deploys on each commit

It's become popular with modern cloud deployment services like Vercel, Netlify, Linc, and so on to deploy web apps on every commit for pull requests. This makes a lot of sense for frontend code.
It's also become popular, however, with frameworks like NextJS, to deploy one's API in the same codebase and infrastructure as their frontend code. But APIs often require a database to function, and databases often change schemas with migrations. To me, this means that preview deploys in a frontend/API monolith could often fail if the data model changes on a branch.
How have others handled this "Preview Deploys" development pattern, when databases get involved? Is there an elegant way to spin up separate database instances per preview deploy, that match the schemas/migrations defined per-branch, and work well with these very distributed, often serverless hosting providers?
Some initial thoughts
is there a fast and cheap way to spin up a DB from a template on deploys (based on a project's main branch), run migrations of a particular branch on it, and have a Vercel preview deploy somehow discover that new DB instance? What if such a database has a lot of data, can it still be kept snappy? Maybe with Docker images?
maybe a checked-in sqlite database might make sense to easily support preview deploys, so long as it's assumed that preview deploys don't experience much parallel access? but unless you're using sqlite as your prod DB, it creates a significant difference between preview and production environments
Or perhaps coupling one's data model to preview deploys is a bad idea altogether and data access ought to be kept separate from this kind of frontend code.
Curious to hear what other people have done.
A couple ways I could think of (and use)
Environment variables
In vercel for example, you could configure individual production, staging and development environment variables.
You could set the staging database_uri environment variable to an in memory (no filesystem access for serverless at vercel) sqlite3 database for any manual testing/ demonstration and and use a different variable in the production environment
Branches
Another way is to use main or master as the development branch and have a separate production branch. Configure vercel (or your platform) to build only when pushed to this production branch.
While you won't actually push to this branch, you'll be pulling from the main (or any other default) branch and merging into the prod one.

MongoDB vs MongoDB Atlas

I am a new web developer and have some questions regarding MongoDB.
The site I am working on uses references that saved data locally with MongoDB. But after doing some research, I saw something called MongoDB Atlas, which saves data to a cloud. I guess my question is, if I were to host a website would it matter which one I chose to use? Or would I be restricted to Atlas? And why would someone pick one over the other?
MongoDb Atlas is a MongoDb server hosting provided by the same guys who make MongoDb (which means they typically know what they're doing). It's handy to use because everything is automatically configured for you, you get some dashboards, monitoring, backup, upgrades, etc. They have a free layer also (aka M0, it has some important restrictions though, read more at their site). As usual with Cloud offerings, they have good pricing for starters, but these can skyrocket if you're operating at significant scale.
If you choose to install MongoDb server "locally", you would need to configure the cluster yourself (althougth there are plenty of e.g. pre-configured MongoDb docker images out there), configure the backup, arrange monitoring, etc. A lot of work, if you want to do it properly.
Considering above, here is my advice...
Choose MongoDb Atlas when:
You have a small personal project
You're a startup and you believe that you will have tens of thousands users soon - Altas allows bootstrap things fast and for a relatively small cost
You're a medium sized company, you're fine with MongoDb pricing, and you don't expect to grow too much
Choose manual installation of MongoDb when:
You have a small single-server project that is not likely to grow into a multi-server deployment. You can run MongoDb docker in the same server - this is usually a bad practice in general, but it works fine for small workloads. I've used this setup (as part of Meteor Up deployment) and it worked fine with thousands of regular users (depends on your application's usage patterns though).
You're a Unicorn-level startup or bigger
You're building something for internal usage and have restriction of using cloud deployments
Your main servers are not located in the cloud. MongoDb cannot batch requests, so it is very important that your MongoDb server is located in the same datacenter as the backend servers, otherwise latency will kill your performance

Best practice for production and test environments in Google App Engin

What are the best practices for production and test (staging) environments in Google App Engine? Is it a good idea to setup separate projects?
We also use Google Cloud Storage and Cloud SQL. I'd like to prevent accidents where someone is mistakenly working in production when they intend to work in test.
We'll be storing a lot of stuff in GCS. From my understanding GCS environments are separate between projects. This can be desirable for us. But, if we want to copy product to test, is it possible to duplicate GCS from one app to another?
Looking forward to hearing how others do this.
Bruyere's answer is technically right, you can either version your app or use separate projects.
In practice, I've done both and you always end up needing to separate the projects for a ton of good reasons :
You might not want the same people to have the rights to update the staging env (all the devs would have this ability for example) and the production env (typically this would be restricted to the tech lead, or QA team, or you continuous integration server)
Isolating two app engine versions is not that easy, in particular when you deal with cron jobs, email or XMPP reception
You might not want the same people to be able to read/write the staging data and the prod data
You want to make sure that the App Engine prod app does not write to the staging Cloud Storage bucket. If they are part of the same project, by default this is possible
My recommendation is to store the environment related data (cloud storage bucket, Cloud SQL url etc) in a configuration file that is loaded by the application. If you use Java, I personnally use a properties file that is populated by Maven based on two profiles (dev and prod, dev being the default one).
Another important point is to separate environments from the start. Once you've started assuming that both environments will live in the same application, a lot of your code will be developed based on that assumption and it will be harder to move back to two different projects.
I was also wondering if there was another option than project for UAT/Prod env, I found this article from Google Documentation says you should use different projects.
Best Practices for Enterprise Organizations
https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations
We recommend that you spend some time planning your project IDs for
manageability. A typical project ID naming convention might use the
following pattern:
[company tag]-[group tag]-[system name]-[environment (dev, test, uat, stage, prod)]
For example, the development environment for the human resources
department's compensation system might be named acmeco-hr-comp-dev.
I can see two ways to do this, all depending on your needs:
1) Using versions on your app, a different instance in the Cloud SQL and different bucket name for GCS you can use the same project. You just have to be super careful about setting what target each of your calls go to and to re-direct them when it goes live.
2) Using a separate project is probably the safer option but either way you will need to use a unique bucket name. Bucket names must be unique across all GCS instances.
Its quite easy to copy from one bucket to another once you get your permissions setup. Using gsutil you can copy from bucket to bucket.

Data Dude/VS Team System Database - Use with multi project databases

My current project uses Visual Studio Team System for Database Professionals GDR2 (aka DataDude). We are the only application using the database that we model using DataDude.
My company would like to consider using DataDude across the board on all our projects. However, I am not sure how well this will work with projects that share a database (which is the bulk of our applications).
For example: ApplicationA, ApplicationB and ApplicationC all share Database1 on Server1. (They don't share source code, just the database.) All three applications are under current development (using Scrum if it matters).
The problem comes when ApplicationB needs to release to our test environment. The auto deployment/scripting features of DataDude would catch the current dev changes of ApplicationA and ApplicationC. (Right now making the Database changes for each application is a manual process).
So, how can I insulate each Application from the other while they share the same database?
Note: I am not as concerned about conflicting changes for this question (ie if ApplicationA makes a DB change that breaks ApplicationC). We can find those in testing. I just need to make sure I don't move any Database Changes that are not part of my currently releasing Application to my Test/Production environments.
Are there any best practices or features that can help me out with this?
We are in a similar situation. We have many applications hitting the same database, and our database is under DBPro source control. We handle this by having various applications work in their own branch of the database source code. Each application will merge down from the main branch on a regular basis so its branch is aware of changes made by others. Then, when one of the applications needs to deploy to testing, a merge up to the main branch is done and then a deployment to the testing server is done.

One DB per developer or not?

In a corporate development environment writing mostly administrative software, should every developer use their own database instance, or should they use a central database instance during development? What are the advantages and disadvantages of each approach? What about other environments and other products?
If you all share the same database, you might have some issues if someone make a structure change to the database and that the code is not "Synchronized" with it.
I highly recommend one DB per developer for the only reason that you don't want to do "write" test to see someone else override you right after. A simple exemple? You try to display product for a website. Everything works until all the products disappear. Problem? Another developer decided to play with the "Active" flag of the product to test something else. In cases like that, a transaction might not even work. End of the story, you spend time debugging for someone else action.
I highly recommend replicating the staging database to the developer database once in a while to synchronize the structure (or better, have a tool to rebuild a database from scratch).
Of course, we require scripts for changes to the database and EVERYTHING is in a Source Control.
The days when database environments should be scarce are long gone. I'm writing this posting on a XW9300 with 5x15k SCSI disks in it. This machine will run a substantial ETL job in a fairly reasonable length of time and (in mid-2007) cost me about £1,700 on ebay including the disks. From a developer's perspective, especially on database centric projects like data warehousing, the line between a developer and a DBA is quite blurred. As I write this I am building a partition management framework for a SQL Server 2005 data warehouse.
Developers should have one or more development databases of their own for (IMO) these reasons:
Requires people to keep stored procedures, patch scripts and schema definition files in source control. Applying the patches can be automated to a fairly large extent. There are even tools such as Redgate SQL Compare Pro that do much of the grunt work for this.
Encourages an application architecture that facilitates easy configuration management and deployment, as people have to deploy onto their own workstations. Many deployment wrinkles will get sorted out long before they hit production or people even realise they could have gone wrong.
Avoids developers tripping up on each other's work. On something like a data warehouse where people are working with ETL code this is an even bigger win.
It encourages a degree of responsibility as developers have to learn basic database administration. This also eliminates a lot of the requirements for operational support staff and some of the dev-vs. ops friction.
If you have your own database, there are no gatekeepers obstructing experimentation or other work on it. The politics around managing 'servers' disappear as there are no 'servers'.
This is a productivity win in an any environment with significant incumbent bureaucracy.
For small data volumes an ordinary PC is fast enough for this. Developer editions or licencing are available for most if not all database management systems and will run on a desktop O/S. If you're working with Linux or Unix this is even less of an issue. For larger data volumes, up to and including most MIS applications, a workstation like an HP XW9400 or Lenovo D10 can be outfitted with 5 15k disks for less than the cost of a lot of professional development tooling. (Yes, I know it's dual licence, but a commercial all-platform licence for QT is about £4000 a seat).
A machine like this will run an ETL process with 10's to 100's of millions of rows faster than you might think.
It facilitates setting up more than one environment for smoke testing or reconciliation purposes. As you have complete control over the machine, you have quite a lot of scope for mocking up conditions in a production environment. For example, I once made a simple emulator for Control-M by just bodging some of its runtime scripts.
Where you have this level of control and transparency over the environment you can produce a fairly robustly tested deployment process which does quite a lot to eliminate opportunities for finger-pointing in production deployment.
I've seen small teams working with 14 environments, and had 7 active on a workstation at the same time. On database heavy work such as ETL, where you're with with whole tables, working in a single dev environment is a recipe for time wastage or spending your time walking on eggshells.
Also, you can use single user development licences for database platforms, which can save you the cost of the workstations just in database licencing. Most developer licences (Microsoft and OTN are a couple of examples I'm familiar with) will let you use the system on a single workstation for a single developer free or for a nominal price.
Conversely, licencing terms on shared development servers are often somewhat murky and I've seen vendors try to shake customers down for licencing on dev servers on more than one occasion.
Each of our developers has a fully functional database. Changes are scripted and source controlled like any other code.
Ideally, yes, each developer should have a "sandbox" development environment, so they can test their code even before deploying it to a shared testing/staging environment.
Each developer's environment should run scripted tests that reset the database to a known state. This is impossible to do in a shared environment.
The cost of giving each developer their own instance is less than the cost of the chaos resulting from multiple developers trying to test volatile changes together in a shared environment.
On the other hand, in many IT shops the system uses complex infrastructure, involving multiple application servers or multiple physical nodes. Then the economics change; it's less expensive for people to cooperate and avoid stepping on each other's work than it would be to replicate it for each developer. Especially true if you integrate expensive third-party systems that don't give you licenses for multiple development environments.
So the answer is yes and no. :-) Do give each developer their own environment if that environment can be reproduced inexpensively.
My recommendation is to have 2 levels of development environment:
Each developer has their own personal development system, with its own dp, web servers, etc. This allows them to code against a known setup, write automated (system level) tests that initialize their database and systems to a known state, etc.
The development integration environment is shared by all developers and used to make sure everything is working together as expected before handing it off to QA. Code is checked out from source control and installed there, and there's only a single instance of any servers (db or otherwise).
This question hints at what a developer needs to do his/her job. Certainly a private DB instance should be provided. Equally important, I would make sure that the DB is the same product/version as what you intend to deploy to. Don't develop on MySQL 6.x and deploy to MySQL 5.x. (This goes for app servers, and web servers as well!)
Having a developer DB doesn't necessarily ean you need it hosted on your local machine. You could have a central DBMS host machine with all dev dbs located on it. The pros are the garauntee that you develop against the target DB. Less overhead on dev boxes, more space/horsepower for beefy IDEs and app servers. The cons are single point of failure for all devs. (The DBMS server goes down nobody can work.) Lack of dev exposure to setting up and administering the DBMS. Devs cannot experiment as easily with upcoming DB releases or alternate DB choices to solve tough problems.
Some of the pros can be cons and vice-versa depending on your organization and structure. Maybe you don't want devs administering the DBMS. Maybe you do plan to support varying DB platforms. The decision boils down to your organization as well as your target platform choices. If you plan to target a variety of DB/OS/app server combinations then each dev should not only have their own DB but should work in a unique combination. (MySQL/Tomcat/OSX for one DB2/Jetty/Linux for another Postegres/Geronimo/WinXP for a 3rd, etc.) If you setup an ASP (Application Service Provider) type shop on an iSeries on the other hand then of course you'll likely have a central host with all dev dbmses still each dev should have at least a separate db instance to allow structural changes to schema.
I have an instance of SQLServer Development Edition installed locally. We have a QA DB server, as well as multiple production servers. All development and integration testing is done using my local server (or other developers local servers). New releases are staged to the QA server. Each release, after acceptance by the customer, is put into production.
Since I mostly do web development, I use the web server bundled with VS2008 for development and local test, then publish the web app to a QA web server hosted on a VM. Once accepted by the customer, it is published to one of several different production web servers -- some virtual, some not, depending on the application.
My department at my company only has limited development environments, purely because of cost of support and hardware. We have a couple of environments which are based on t-1 nightly refreshes from production, and some static ones.
Ideally, everyone should have their own, but in many cases, this is going to be impractical when the following are true:
you have a large number of developers needing resources (our department has maybe 80)
each developer needs multiple resources (typically i use 4-5 different dbs each day)
up to date data is important (you just cant refresh them fast enough)
In these cases, shared instances and good communication are whats needed.
One advantage to one database per developer, each developer has a snapshot of their own data in a "known" state.
I like the idea of using a local version when a developer must be isolated - developing a schema change, performance testing, setting up specific scenarios, etc...
At other times use the shared version as to insure everything is in sync with each other.
I think there's a terminology problem here. It's been a while since I've worn my DBA hat (golly gee, almost 10 years) - so someone else can chime in and correct me.
I think everyone is in agreement that each developer should have his own sandbox schema set.
In MySQL and Sybase/MS SQLServer, each database engine can support multiple databases. Each database is (normally) fully independent of the other database. So you can have one database engine instance, and give each developer his database space to do as he wish. the only problem is if the developers are using tempdb -- there can be collisions there (I think -- this you will need to look up). Just be careful that cross-database queries with fixed database names are not used.
In Oracle, the database engine instance is tied to a particular schema set. If you have multiple developers on the same engine, they are all pointing to the same tables. In this case, yes, you will need to run multiple instances.
Each of our developers has a local database. We store the create script AND a dump of the "standard data" in our SVN repo. We have an extensive set of tests that must pass against this test data. We also have a "sandbox" database that is available for people to put data in that they want shared into the standard data. This works well for us and allows us to let developers modify their local copies of data to test things, but we control what gets passed to other developers. We also strictly control schema changes, so we don't encounter the problems that someone else mentioned.
It really depends on the nature of your application. If yours is a client-server architecture in a distributed environment, it is best to have a central database that everyone uses. If the product gives users an environment with local database instances, you can use that. It is best if your development mirrors the real world environment as closely as possible.
It is also dependent on what stage of development you are in. Probably in the early stages, you dont want to get bogged down by connectivity, network and distributed environment issues and just want to be up and running. In such a case, you can start with a database instance-per-user model before switching to the central model as the product reaches some level of maturity.
In my company we tend to copy the entire DB when working on non-trivial new features. The reasoning there is disk space is cheap, whereas accidental data loss (even if it's test data) isn't.
I've worked in both types of development environments. Personally, I prefer to have my own DB/app server. However, there may be some advantages to using a shared infrastructure for development.
The main one is that a shared environment more closely resembles a real-world scenario: you are more likely to uncover problems with locking or transactions when all developers share a DB. Giving each developer their own DB may lead to "it works on my DB" syndrome.
However, if you need to apply and test schema changes or optimisations, then I can see problems in this sort of set-up.
Maybe a compromise solution would work best: all developers share infrastructure, and if someone needs to test schema changes, they create their own temporary DB instance (maybe there is one just sitting there for this purpose?) until they are happy to commit the new schema to source control.
You do have your entire schema (and test data) in source control, right? Right???
I like the compromise solution (all developers share infrastructure, and if someone needs to test schema changes, they create their own temporary DB instance (maybe there is one just sitting there for this purpose?) until they are happy to commit the new schema to source control.)
One DB per developer. No question. But the issue really is how to script entire databases, "control data", and version them. My solution is here : http://dbsourcetools.codeplex.com/
Have fun. - Nathan.
The database schemas should be held in source control and developers should own the changesets checked in for code and db together. Prior to checkin the developer should be working on his own database. After checkin, an automated build (eg: on checkin, nightly, etc), should update a central integrated db, along with the apps themselves.
At developer instance level the data loaded should be appropriate for unit testing, at least. At integrated level, the shared db should hold data also appropriate for testing, but should not rely on production replication - this is just a slack substitute for managed test data.
In my experience the only reason that developers opt for a shared db is that they believe that developing and running on recent production data is somehow 'real' and means that they can put less effort into testing. They prefer to tread on each others toes and put up with a shared db that slowly corrupts before the next production refresh than write and manage proper tests. It's this kind of management practice that gives the IT world the poor reputation to deliver that it currently has.
I'd suggest to use one instance of the database. You don't want your database to be a moving target.

Resources