Is it an anti-pattern to keep a database in a container? - database

I have a question regarding best practices with containers. Is it an anti-pattern to have a database in a container?
I've seen implementations of DBs in containers a couple times now and I'd love to get y'all's thoughts on it. From my understanding, containers should be lightweight and effectively stateless. They should also operate as cattle, not pets (as in, easily destroyed and you don't rely on one container staying to perform business functions).
From what I know of DBs, they aren't usually cattle, and depending on the application they aren't lightweight. They're also pretty much inherently stateful.
It's pretty clear that I'm skeptical of DBs being hosted in containers, but I'd really love to hear what y'all think. I'm not too familiar with DBA work so hearing from those with more experience (especially if you've implemented it and have experiences you can talk to) would be great.

Its a great question, though its a bit broad. It completely depends on what exactly you are running and how you plan your workloads.
The thing to keep in mind about containers is that there really isnt any magic here. Containers ultimately boil down to kernel level (cgroup) limits imposed on a process and the orchestration layer (eg Kubernetes or CloudFoundry Diego) are responsible to reacting to when the container is killed off for crossing these limits (eg out of memory).
In general, there are a number of high level factors to keep in mind
What are the data durability requirements for this project
What are the workloads (eg hourly spikes, unpredictable load, etc)
What is your uptime SLA and can you clients handle failing over to new masters in your data tier gracefully
Most importantly, is containerization the right pattern for what your project's data tier is trying to achieve.
Beyond this, you have to look at characteristics of your orchestration environment. If you need to be able to persist disk contents, you need to make sure you pick a container orchestrator that is able to fill this requirement.
You may have something like a sharded MongoDB cluster using the In-Memory engine for a caching layer that requires a bit more capability than a typical key value store like memcache (eg ability to query/filter the cache itself). Depending on your project's requirements, it may be perfectly fine to lose this "cache" layer and rebuild it on demand.
In other workloads. You could run something like enterpriseDB ARK to provide clustered, highly available, containerized PostgreSQL deployments on top of Kubernetes. This comes with its own challenges, but it enables you to implement a service broker model in your micro services architecture to deploy and persist the data tier for each of your micro services in a way that mitigates a monolithic data tier which is prone to chatty neighbor problems in this type of architecture.

Related

Database for a java application in cluster

I'd like to play around with kubernetes, I'm able to start a simple app, but now I'd like to design something more complex. Nevertheless I can't figure out, how to handle the database access in such architecture.
Let's say I have 100 pod replicas of some simple chat application. They all need to access the same database (or more like data set) and perform CRUD operations upon them. How to design it to keep the data consistent and eliminate the risk of deadlocks?
If possible, I'd like to use SQL-like database, so I can comfortably use hibernate and other tools I'm familiar with.
Is this even possible or do I have to use totally a different approach? What is the name of the technology or architecture I'm searching for?
1) You can use a connection pool to reduce this number and make the connection settings more aggressive/elastic;
2) Split your microservices in such way the access to the persistence is a microservice exposing your CRUD service to your persistence(mysql/rdms/nosql/etc). In that way you most likely don't need hundreds of replicas of your pods.
3) Deadlocks / locking strategies - as Andrew mentioned in the comments, it's more related to your software development architecture rather than K8s itself. There are plenty of ways to deal with that with pros/cons.

Stateless Micro services and database

We have a requirement of building stateless micro services which rely on a database cluster to persist data.
What is the approach that is recommended for redundant stateless micro services(for high availability and scalability) using the database cluster. For example: Running multiple copies of version 1.0 Payment service.
Should all the redundant micro services use a common shared DB schema or they should have their own schema? In case of independent DB schema inconsistency among the redundant services may exist.
Also how can the schema upgrade handled in case of common DB schema?
This is a super broad topic, and rather hard to answer in general terms.
However...
A key requirement for a micro service architecture is that each service should be independent from the others. You should be able to deploy, modify, improve, scale your micro service independently from the others.
This means you do not want to share anything other than API definitions. You certainly don't want to share a schema; each service should be able to define its own schema, release new versions, change data types etc. without having to check with the other services. That's almost impossible with a shared schema.
You may not want to share a physical server. Sharing a server means you cannot make independent promises on scalability and up-time; a big part of the micro service approach means that the team that builds it is also responsible for running it. You really want to avoid the "well, it worked in dev, so if it doesn't scale on production, it's the operations team's problem" attitude. Databases - especially clustered, redundant databases - can be expensive, so you might compromise on this if you really need this.
As most microservice solutions use containerization and cloud hosting, it's quite unlikely that you'd have the "one database server to rule them all" sitting around. You may find it much better to have each micro service run its own persistence service, rather than sharing.
The common approach to dealing with inconsistencies is to accept them - but to use CQRS to distribute data between microservices, and make sure the micro services deal with their internal consistency requirements.
This also deals with the "should I upgrade my database when I release a new version?" question. If your observers understand the version for each message, they can make decisions on how to store them. For instance, if version 1.0 uses a different set of attributes to version 1.1, the listener can do the mapping.
In the comments, you ask about consistency. This is a super complex topic - especially in micro service architectures.
If you have, for instance, a "customers" service and an "orders" service, you must make sure that all orders have a valid customer. In a monolithic application, with a single database, and exclusively synchronous interactions, that's easy to enforce at the database level.
In a micro service architecture, where you might have lots of data stores, with no dependencies on each other, and a combination of synchronous and asynchronous calls, it's really hard. This is an inevitable side effect of reducing dependencies between micro services.
The most common approach is "eventual consistency". This typically requires a slightly different application design. For instance, on the "orders" screen, you would invoke first the client microservice (to get client data), and then the "orders" service (to get order details), rather than have a single (large) service call to retrieve everything.

Synchronizing intranet and web data

I am just getting started breaking a .NET application and its SQL Server database into two systems - an intranet and a public website.
The various database tables will need to be synchronised between the two databases in different ways, for example:
Moving from web to intranet, with the intranet data becoming read-only
Moving from intranet to web, with the web data becoming read-only
Tables that need to be synchronised and are read/write on both the intranet and web databases.
Some of the synchronisation needs to occur relatively quickly with minimal lag, possibly with some type of transaction locking to ensure repeatable reads etc. Other times it doesn't matter if there is a delay between synchronisation.
I am not quite sure where to start with all this, as there seems to be many different ways of achieving this. Which technologies and strategies should I be looking at?
Any tips?
A system like that looks like the components are fairly tightly coupled. An upgrade across several systems all at once can turn into quite the nightmare.
It looks like this is less of a replication problem and more of a problem of how to maintain a constant connection to a remote database without much I/O lag. While it can be done, probably isn't going to work out very well in terms of scalability and being able to troubleshoot problems.
You might look at using some message queueing and asynchronous data processing from the remote site to the intranet. You'll probably have to adjust some expectations of the business side so that they don't assume that everything is accessible real-time all the time.
Of course, its hard to give specifics without more details. It might be a good idea to look into principles of SOA and messaging systems for what you're trying to do.
Out of the box you have SQL Server Replication. Sounds like a pair of filtered transactional replication publications can do the job. Transactional replication has a low overhead on the publisher and can ensure transactional consistency of the published changes.
Nathan raises some very valid points about the need for a more loosely coupled solution. Service Broker can fit that shoe quite well with its loosely coupled asynchronous nature, and provide a headache free upgrade future since SSB is compatible between SQL Server versions and editions. But this freedom comes at the cost of letting the heavy lifting of actually detecting the changes and applying them to the tables to you, as application code, not a trivial feats.

Webservice/Entity objects or datareader for my winform app?

Is it best to use a webservice to pull data from a database and load it into my entity object, then send the entity objects to my winform app?
Will this make any performance difference over going direct to the database and pulling a datareader back to the winform client, then loading the entities on the client? Some of the users will be in China accessing a database in the US.
Are there better options?
Thanks
This is subjective, but in general, you will find better performance going directly to the database. That's not good for separation of concerns, however.
Given the highly distributed nature of your system, using web services (or at least an SOA approach) makes sense to me. However, I would go an extra step and have the business logic in at the web services tiers, not just data access, but again, that depends highly on the situation. I just think that the less places you have to change code and re-deploy to if coding changes are needed, the better,
Is there a reason this has to be a client app and not a web application? it would make keeping your distributed users up-to-date a bit easier.
The best option is probably to have distributed databases and/or distributed servers. No matter how you go from a client app in China to a database in the US, the network will be a massive bottleneck and performance will likely be pretty horrible. If you can put a replicated database in China, that would make a huge positive difference.
Whether you have a webservice or not is not going to be a huge factor here. Sure, adding a webservice adds a network hop, which is going to negatively impact performance, but as I said, I don't think that will be your performance bottleneck.

Physical middle-tier separation for Windows Forms apps

I've been designing quite a few Windows Forms applications lately (data-entry apps, office integration, etc), and although I've always designed with logical tiers in mind, the "middle tier" business logic layer has always been hosted on the desktop PC (i.e. physical client-server).
As I approach more complex applications, I find myself yearning for a physical middle tier instead, with the desktop client passing requests back to an application server to process business logic (which may change regularly) and interfaces. It feels like I'm missing out on factors such as scalability and maintainability that are more native to Web apps.
I'm curious to know how far other WinForms developers have taken their middle-tier separation:
How much processing (if any) do you perform on a middle-tier server?
What communication method are you using - WCF, remoting, web services, etc?
How much is performance a factor, and how often do you roundtrip to the server?
Is there are a benefit in moving business logic onto a separate tier, or is it more practical to host components locally on a PC (and just make sure you have a good deployment model to push out regular releases as business rules change)?
Alternatively, should I be guiding customers away from WinForms completely if these factors are involved? With alternatives such as Silverlight and even ASP.NET w/ AJAX, the reasons to choose a WinForms client seem to be shrinking.
What is important to keep in mind is that there is a trade-off between the ease of development with a seperate middle tier vs all of the scalability benefits etc. What I mean by this is that you have to refresh interface mappings etc in your code, you have to deploy a middle tier somewhere for your testers to use, which needs to be refreshed etc. Furthermore, if you are lazy like me and pass your Entity Framework objects around directly, you cannot serialise them to a middle tier, so you then need to create DTO's for all of your operations etc.
Some of this overhead can be handled by a decent build system, but that also needs effort to set up and maintain.
My preferred tactic is to keep physical seperation in terms of assemblies (i.e. I have a seperate business logic / data access assembly) and to route all of the calls to the business layer through an interface layer, which is a bunch of Facade classes. So all of these assemblies reside within my windows app. I also create interfaces for all of these facades, and access them through a factory.
That way, should I ever need the separation of a middle tier, and the trade-off in terms of productivity is worth it, I can separate my business layer out, put it behind a WCF service (as that is my preferred service platform) and perform some refactorings in terms of what my facades hand out, and what they do with what they accept.
This is one example of why I tend to always do work in a web environment. If the network is available to your client application for service calls, it's certainly available to send and retrieve data from a web server.
Certainly, client restrictions can alter your final path, but when given the chance to influence the direction, I steer towards web-based solutions. There are deployment technologies available that give you an easier upgrade path than a traditional desktop package, but there's no real substitute for updating a server-based application.
Depending on your application, there are several performance issues to keep in mind.
If your work is very similar between various clients (shared data), then doing the processing on a middle tier is better because you can make use of caching to cut down on the overall processing.
If your is different between clients (private data), then you won't have much gain by doing the processing at a middle tier.

Resources