Is it a good practice to have the database within the same container as the app? - database

We have several sites running under a CMS using virtual machines. Basically we have three VM running the CMS and a SQL instance to store data. We plan to transition to containers, but to be honest I'm don't have much idea about it and my boss plans to have the full app (CMS and DB) within an image and then deploy as many containers as needed (initially three). My doubt here is that as far as I know containers work better separating the different parts and using them as microservices, so I don't know if it's a good idea to have the full app within the container.

Short answer is: No.
It's best practice with containers to have one process per container. The container has an entrypoint, basically a command that is executed when starting the container. This entrypoint will be the command that starts your process. If you want more than one process, you need to have a script in the container that starts them and puts them in the background, complicating the whole setup. See also docker docs.
There are some more downsides.
A container should only consist of everything it needs to run the process. If you have more than one process, you'll have one big container. Also your not independent on the base image, but you need to find one, that fits all processes you want to run. Also you might have troubles with dependencies, because the different processes might need different version of a dependency (like a library).
You're unable to scale independently. E.g. you could have 5 CMS container that all use the same database, for redundance and performance. That's not possible when you have everything in the same container.
Detecting/debugging fault. If more than one process runs in a container, the container might fail because one of the processes failed. But you can't be sure which one. If you have one process and the container fails, you know exactly why. Also it's easier to monitor health, because there is one health-check endpoint for that container. Last but not least, logs of the container represent logs of the process, not of multiple ones.
Updating becomes easier. When updating your CMS to the next version or updating the database, you need to update the container image of the process. E.g. the database doesn't need to be stopped/started when you update the CMS.
The container can be reused easier. You can e.g. use the same container everywhere and mount the customer specifics from a volume, configmap or environment variable.
If you want both your CMS and database together you can use the sidecar pattern in kubernetes. Simply have a pod with multiple containers in the manifest. Note that this too will not make it horizontal scalable.

That's a fair question that most of us go through at some point. One tends to have everything in the same container for convenience but then later regret that choice.
So, best to do it right from the start and to have one container for the app and one for the database.
According to Docker's documentation,
Up to this point, we have been working with single container apps. But, we now want to add MySQL to the application stack. The following question often arises - “Where will MySQL run? Install it in the same container or run it separately?” In general, each container should do one thing and do it well.
(...)
So, we will update our application to work like this:

It's not clear what you mean with CMS (content/customer/... management system). Nonetheless, milestones on the way to create/sepearte an application (monolith/mcsvs) would propably be:
if the application is a smaller one, start with a monolithic structure (whole application as an executable on a application/webserver
otherwise determine which parts should be seperated (-> Domain-Driven Design)
if the smaller monolithic structure is getting bigger and you put more domain related services, you pull it apart with well defined seperations according to your domain landscape:
Once you have a firm grasp on why you think microservices are a good
idea, you can use this understanding to help prioritize which
microservices to create first. Want to scale the application?
Functionality that currently constrains the system’s ability to handle
load is going to be high on the list. Want to improve time to market?
Look at the system’s volatility to identify those pieces of
functionality that change most frequently, and see if they would work
as microservices. You can use static analysis tools like CodeScene to
quickly find volatile parts of your codebase.
"Building Microservices"-S.Newman
Database
According to the principle of "hiding internal state", every microservice should have its own database.
If a microservice wants to access data held by another microservice,
it should go and ask that second microservice for the data. ... which allows us to clearly separate functionality
In the long run this could be perfected into completely seperated end-to-end slices backed by their own database (UI-LOGIC-DATA). In the context of microservices:
sharing databases is one of the worst things you can do if you’re trying to achieve independent deployability
So the general way of choice would be more or less:

Related

Multiple independently-scaling programs within one Google Cloud Project

I'm new to cloud programming but it's my understanding that the server I'm building can be run in multiple regions as multiple "instances" to improve user experience. In other words, my server code is just running on several different machines at once, all independently of each other. However, things like a database server for example should only be run as a single instance no matter how many server instances there are.
Is there a way to do this using Google App Engine? More specifically, is there a way to categorize portions of the project as scalable and others as non-scalable? My initial plan was simply to make two different projects - one which scales automatically and one which does not scale - and have them communicate through network requests. This could (potentially?) have the added benefit of spreading the resources used by my project across multiple cloud projects, reducing the per-project usage for billing purposes.
I'd love to know if I'm on the right track, or if what I'm doing is over-complicating things.
Yes, this is possible, you will need to create 2 separate App Engine service and set one with automatic scaling and the other with manual scaling and setup how many instances you want for that service up and running all the time. You can read this documentation for more details on the types of scaling and this documentation on how to set this up in your app.yaml file.
That being said I don't think this will reduce your cost. In fact the opposite might happen, as App Engine is designed to reduce wasted resources as much as possible with auto scaling, and if you were to use Manual Scaling with a bigger load of instances than what you actually need you will be charged more for that, so you need to consider this into the design of your infra, I also recomment taking a look at the App Engine pricing documentation.

How to implement continuous delivery on a platform consisting of multiple applications which all depends on one database and each other?

We are working on old project which consists of multiple applications which all use the same database and strongly depend on each other. Because of the size of the project, we can't refactor the code so they all use the API as a single database source. The platform contains the following applications:
Website
Admin / CMS
API
Cronjobs
Right now we want to start implementing a CI/CD pipeline using Gitlab. We are currently experiencing problems, because we can't update the database for the deployment of one application without breaking all other applications (unless we deploy all applications).
I was thinking about a solution where one pipeline triggers all other pipelines. Every pipeline will execute all newly added database migrations and will test if the pipeline is still working like it should. If all pipelines succeeds, the deployment of all applications will be started.
I'm doubting if this is a good solution, because this change will only increase the already high coupling between our applications. Does anybody know a better solution how to implement CI/CD for our platform?
You have to stop thinking about these as separate applications. You have a monolith with multiple modules, but until they can be decoupled, they are all one application and will have to deployed as such.
Fighting this by pretending they aren't is likely a waste of time, your efforts would be better spent actually decoupling these systems.
There are likely a lot of solutions, but one that I've done in the past is create a separate repository for the CI/CD of the entire system.
Each individual repo builds that component, and then you can create tags as they are released or ready for CI at a system level.
The separate CI/CD repo pulls in the appropriate tags for each item and runs CI/CD against all of them as one unit. This allows you to specify which tag for each repo you want to specify, which should prevent this pipeline from failing when changes are made on the individual components.
Ask yourself why these "distinct applications" are using "one and the same database". Is that because every single one of all of those "distinct applications" all deal with "one and the same business semantics" ? If so, as Rob already stated, then you simply have one single application (and on top of that, there will be no decoupling precisely because your business semantics are singular/atomic/...).
Or are there discernable portions in the db structure such that a highly accurate mapping could be identified saying "this component uses that portion" etc. etc. ? In that case what is it that causes you to say stuff like "can't update the database for the deployment of ..." ??? (BTW "update the database" is not the same thing as "restructure the database". Please, please, please be precise.) The answer to that will identify what you've got to tackle.

Migrating existing infrastructure & scaling with Terraform

We are planning to automate creation & deletion of VMs in our DCs which power our cloud service. The service is such that every new customer gets dedicated VMs (at least 3) - so the number of VMs keep growing. We already have about 2000 VMs running on ESXi. So we now have two problems to solve before adopting terraform -
How do we migrate existing VMs to be managed by Terraform (or should we, at all)?
Generating resource specification could be scripted but verifying the plan to ensure nothing is affected will be a challenge - given the volume of VMs & the fact that they are all LIVE puts extra pressure on the engineers.
As the number of VMs increases, the number of .tf files will keep increasing on the disk. We could club multiple VMs into a single file but that would make deletion of individual VMs, programmatically, a bit tricky. Splitting files into multiple directories is simple workaround I can think of but...
Is there a better way to handle scale with terraform?
I couldn't find any blogs which discuss these problems, hence looking for some advice from practical experience here.
Good to see community starting to ask Terraform related questions on Stack Overflow more and more.
Regarding your questions:
Migrate existing VMs to be managed by Terraform means updating tfstate file. As of now there is no way to create resource definitions for already created resources and put it into state file automatically (though there are tools like Terraforming, which does it partially and only for AWS resources). So, you will have to describe resources in *.tf files, update tfstate file manually and then verify that tfstate == tf by running terraform plan which should say that there are no changes, which should be applied. Regarding what exactly to put into tfstate file - I would recommend to create resource definition in tf first, then create dummy VM (terraform apply) based on that, find relevant objects in updated tfstate file and update those dummy values with real values of your VMs in that tfstate file (you will also need to update serial to prevent local/remote state inconsistency error).
I don't know other smarter way of handling large amount of related resources other than grouping them by directories. In that way you can execute plan/apply just for specific logically separated directories, but you will have to have separated state files. It may easily be overkill (kind-of-warning-so-do-not-try-at-home).
Mostly there are these suggestions I keep in mind when working with Terraform (especially with large amount of resources as you have):
Organize your code, so that you have modules in one place and you pass parameters into them in another place. Reusability of code, or how it is called now :)
Use -target flag on commands like terraform plan and terraform apply to limit resources you want to touch.
Hope it helps! And more people will enjoy Terraform.

CMS Database design - Master database or Multi-Db per site

I am in process of designing my CMS that I am about to create. I was thinking about the database and how I want to go by approaching it.
Do you think its best to create 1 master database for all my clients websites? or Should I have 1 database per site?
What is the benefits and negatives on both approaches? I am always thinking about the future so I was thinking about implementing memcache or APC cache to the project, to offer an option to my client.
Just trying to learn the best practices and what other developers apporach would be
I've run both. My business chooses to separate client-specific data into separate tables so that if one happens to go corrupt, not all are taken down. In an ideal world this might never happen, but murphy's law....It does seem very easy to find things with them separated. You will know with 100% certainty that one client's content will never show up on another's page.
If you do go down that route, be prepared to create scripts that build and configure databases for you. There's nothing fun about building a great system and having demand for it, only to spend your time manually setting up DB's and installs all day long. Also, setting db names is one additional step that's not part of using a single db table--it's a headache that will repeat itself seemingly over and over again.
Develop the single master DB. It will take a small amount of additional effort and add a little bit more complexity to the database design, but will give you a few nice features. The biggest is being able to share data between sites.
Designing for a master database means that you have the option to combine sites when it makes sense, but also lets you install a master per site. Best of both worlds.
It depends greatly upon the amount of customization each client will require. If you forsee clients asking for many one-off features specific to their deployment, separate databases based off of a single core structure might make sense. I would highly recommend trying to make any customizations usable by all clients though, and keep all structure defined in one place/database instead of duplicating it across multiple databases. By using one database, you make updating the structure straightforward and the implementation consistent across all sites so they can all use the same CMS code.

How to organize a database in Django with multiple, distinct apps?

I'm new to Django (and databases in general), and I'm not sure how to structure the following. The sources of data I'll have for my site are:
a blog
for a few different games:
a high score list
user-created levels
If I were storing the data in ordinary files, I'd just have one file for each of the above. In Django, ideally (I think) I'd have a separate database for each of these, but apparently multiple database support isn't there for Django yet. I'm worried (unnecessarily?) about keeping everything in one database for two reasons:
If I screw something up in one of the sections, I don't want to mess up the rest of the data.
When I'm working on one of these sections, I'd like the freedom to easily change the model around. Since I've learned that syncdb doesn't, in fact, sync the database, I've decided that the easiest thing to do when messing around with a model is to simply wipe the database and start over. Again, I'm worried about messing up the other sections. I looked at south, and it seems like more trouble than it's worth during the planning stages of an app (but I'll reconsider later when there's actually valuable data).
Part of the problem is that I'm not really comfortable keeping my data in a binary format. I'm used to text, so I can easily diff it, modify it in an editor, etc., without going through some magical database interface (I'm using postgresql, by the way).
Are my fears unfounded? How do people normally handle this problem?
For what it's worth, I totally understand your frustration as I went through a really similar thought process when starting. Unfortunately, there isn't much you can do (easily, anyway) besides get familiar with the tools you'll be using.
First of all, you don't need multiple databases for what you're doing - one will do. Each app will create its own tables in the database which are somewhat isolated from one another. As czarchaic mentioned, you can do python manage.py reset app_name to reset just one of them in case you change your model. You will lose that data, though.
To get data in relatively easy to work with format, you can use the command python manage.py dumpdata > file_name.json, and then to reload it later python manage.py loaddata file_name.json. You can use this for backups, local test data, or as a poor man's migration (hint: South would be easier).
Yet another option is to use a NoSQL database for any data you think will need extra flexibility. Just keep in mind that Django doesn't have support for any of these at the moment. That means no no admin support or ModelForms. Of course, having a model may become unnecessary.
In short, your fears are unfounded. You should "organize" your database by project to use the Django term. Each model in each app will have it's own table, but they will all be in the same database. Putting them in a separate database isn't a good idea for a whole host of reasons, the biggest is that you cannot query across databases.
While I agree that south is probably a bit heavy for your initial design/dev stages it should be considered seriously for anything even resembling a beta version and absolutely necessary in production.
If you're going to be messing with your models a bunch during development the best thing to do is use fixtures to load data in quickly after running sync. Or, if you are going to be changing a bunch of required fields, then write some quick Python to create dummy data for you.
As for not trusting your data to binary, a simple "pg_dump " will get you a textual version of your data. It sounds to me like you're working on your app against live production data, which is a mistake. Your goal should be to get your application built, working, and tested on fake data or at least a copy of your production data and then when you're sure everything is golden migrate it into production. This is where things like south come in handy as you can automate this deployment and it will help you handle any database table/column changes you need to make.
I'm sure all of this sounds like a pain, but all of it is able to be automated and trust me it makes your life down the road much much easier.
I generally just reset the module
>>> python manage.py reset blog
this will reset all tables in INSTALLED_APPS.blog
I'm not sure if this answers your question but it's much lest destructive than wiping the DB.
Syncdb should really only be used for development. That's why it doesn't really matter if you wipe the tables and start again, perhaps exporting look up data into a json file that you can import each time you sync.
When your site reaches production however, you have a little more work. Any changes you make to your models that need to be reflected in the database, need to be emitted as SQL and run manually on the database. There's a django-admin.py function to emit the suggested SQL, which you can use to build up a script to run on the database to migrate it. Like you mention, a migrations app like South can be beneficial here but it's not strictly needed.
As far as your separation of sites goes, run them as separate sites/projects. You can have a separate settings file per project which allows you to run two different databases. This is in contrast to running the two sites as separate apps within the same project. If they're totally separate they probably shouldn't be in the same project unless you need to leverage common code.

Resources