Is running a database in Kubernetes an antipattern? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 months ago.
Improve this question
Let's say we are running some services in a Kubernetes cluster and one of them requires a PostgreSQL instance, expected to persist data reliably. Should the DB live in the cluster or be configured separately?
Imagine that the DB is deployed in the cluster. This probably means one of the following:
We need a process for migrating the data to another node in case the current one goes down. This sounds like a non-trivial task. Or:
The node where the DB lives has to be treated in a special way. Horizontal scaling must be constrained to the other nodes and the cluster ceases to be homogeneous. This might be seen as a design flaw, going against the spirit of maintaining disposable, replaceable containers.
Point (1) applies only to self-managed clusters where all the storage we have at our disposal is tied to machines where the nodes run. If we are using a managed cloud, we can use persistent volume claims and a new instance can pick up the data automatically. Still, this means that if the node with the DB is removed, we will suffer a database downtime until a new instance comes up. So point (2) remains valid also for managed K8s offerings.
Therefore I can well understand the argument for keeping the DB outside of Kubernetes. What would some counterarguments look like? There are a lot of official helm charts for various DBs which suggests that people keep their DBs in Kubernetes clusters after all.
Happy to learn some critical thoughts!

This is not an anti-pattern. It is just difficult to implement and manage.
Point 1
In a self hosted cluster also you can have persistent volume storage provisioned though GlusterFS and CEPH. So, you don't always have to use ephemeral storage. So, Point 1 is not fully valid.
The DBs are generally created as a statefulsets, where every instance gets its own copy of data.
Point 2
When your DB cluster horizontally scales, the 'init' container of the new DB pod or a CRD provided by the DB needs to register the 'secondary' DB pod so it becomes the part of your dB cluster
A statefulset needs to also run as a headless service so the IPs of each endpoint is also known all the time for cluster healthcheck and primary->secondary data sync and to elect a new primary selection in case the primary node goes down
So, as long as the new pods register themselves to the DB cluster, you will be okay to run your db workload inside a kubernetes cluster
Further reading: https://devopscube.com/deploy-postgresql-statefulset/

Related

MongoDB - set replication to DocumentDB

We're setting up a local MongoDB cluster - Locally, we'll have one primary and one node, and we want to have another node in AWS. Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Also, I know I must have an odd number of total nodes, is it possible to first add one node and then add another one?
Thanks ahaed.
Also, I know I must have an odd number of total nodes
In a MongoDB replica set, you can have any number of nodes you like. It is possible to have a 2-node replica set, although it's not very practically useful since unavailability of a single node (e.g. a restart for maintenance) would make the whole deployment unavailable for writes. A 4-node replica set is a feasible construction if you wanted an additional replica somewhere (e.g. for geographically close querying from a secondary, or for analytics querying), though if you are simply doing this for redundancy you should probably stick with the standard 3-node configuration and configure proper backups.
Is it possible to first add one node and then add another one?
You can reconfigure a replica set at any time.
Is it possible to have that node as the DocumentDB service instead of an EC2 instace?
Unlikely. DocumentDB is not MongoDB. DocumentDB pretends to be like a MongoDB but it 1) pretends to be an old version of MongoDB, 2) even then many features don't work, and 3) it's not anywhere near the same architecture as MongoDB under the hood. So when you ask a genuine MongoDB database to work with a DocumentDB node, this will probably not work.
This assumes you can even configure DocumentDB in the required manner - I suspect this won't be possible to begin with.
If you're only trying to replicate the data to DocumentDB, Database Migration Service is a good tool for the job: https://aws.amazon.com/dms/
But like others have said, this will be a separate cluster from your MongodDB setup.

How to use Chaos Monkey on local cluster

I have a cluster back in my office for testing purposes. I have there a database and i would like to make all kind of "monekybusiness" to those test machines, long before i want to go to production.
I zipped 2-3 coffees all this morning trying to figure out HOW to make the "inFAMOUS" Simian Army to chew my nerves here on my local machines.
Everywhere i read, saw all kind of setups for AWS.
Question : Is there a possibility to deploy the Monkeys on my local cluster? Or is there any other alternative to Simian Army?
What is the question that you want to answer with your tests?
ChaosMonkey is a resilience tool that was design for the cloud, its main purpose is to verify that AWS' Auto Scaling Groups (ASG) will be able to re-provision faulty/offline nodes and that the application is capable to perform in a stable way when this process is being done. Do you have an automated process like this in your local cluster?

Peer to peer replication of local databases

I have a program in C that monitors traffic and records the URLs visited by the user. Currently, I am maintaining this in a hash table. My key is the src-IP address and the result is a data-structure with a linked list of URLs. I am currently maintaining 50k to 100k records in a hash table. When the user logs out, the record can get deleted.
The program independently runs on a Active-Standby pair. I want to replicate this database to another machine in case my primary machine crashes (the 2 systems act as Client and Server) and continue recording stuff associated with the user.
The hard way is to write code for sending this information to the peer and on the peer system to receive and store. The issue is, it will add lots of code (and bugs!). To do data-replication and data-store, here are a few prereqs:
I want data-record replication between these machines. I am NOT looking at adding another machine/cluster unless required.
Prefer library so that query is quick. If not another process on the same machine to which I can IPC.
Add, update and delete operations should be supported.
In memory database a must.
Support multiple such databases with different keys.
Something that has publish/subscribe.
Resync capability if the backup dies and comes back again.
Interface should be in C
Possible options I looked at were zookeeper, redis, memcached, sql-lite, berkeley-db.
Zookeeper - Needs odd number of systems for tie-break. Not suitable for 1 to 1.
Redis - Looks to fit my requirements with hiredis for C interface. Separate process though.
Memcached - I don't have any caching requirements.
Sql-lite - Embedded database with C interface
Berkeley-DB - Embedded database for better scale.
So, Redis, Sql-lite and Berkeley-DB look like my options to go forward. Appreciate any help/thoughts on the DBs I should research more for my requirements. Or if there are any other DBs I should research? I apologize if my question is very generic. If the question does not belong here, please point me to the right forum.

Disadvantages of SQL Server Service Broker [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have been doing r&d for the scope of SQL Server Service Broker to replace current messaging solution MSMQ. I want to know disadvantages of SQL Server Service Broker in contrast to MSMQ for following criteria.
Development
Troubleshooting
Performance (let's say we need to process 100,000 messages daily, having avg size around 25 KB)
Scalability
I've used Service Broker in my current project, having previously used MSMQ (brokered by MassTransit) with great success. I was initially dubious about using Service Broker, but I have to admit it has performed very well.
If you're using the Publish / Subscribe model, then I would use Message Queueing every time (though I would use RabbitMQ over MSMQ if the project allowed) but when you just want to chew through a stack of data and persist it to Sql Server, then Service Broker is a great solution: the fact it's so 'close to the metal' is a big advantage.
Development
Service Broker requires a lot of boilerplate, which is a pain, but unless you're planning on having lots of different queues it's manageable. Sql Server projects in Visual Studio take a lot of the pain of deploying away.
Troubleshooting
Service Broker is a black box - messages go in, and they usually come out, but if they don't then troubleshooting can be problematic, and all you can do is query the system views - and sometimes you simply can't find out what has gone wrong. This is annoying, but MSMQ has the same kind of issues..
Performance
Service Broker performance is excellent for our use case (see comment section below for discussion). We are processing a lot more than 100,000 messages per day, more than 30,000 per hour at our SLA load, and our message sizes are large. I would estimate we process close to 100,000 messages per hour during heavy load testing.
For best performance I would advise you to use a Dialog Pool like this one 1 as creating a Service Broker dialog can be an expensive operation.
You will also want to use the Error Handling procedures detailed by Remus Rusanu. (If you do use Service broker, you might as well read everything Remus has written on the subject before you start, as you'll end out reading it eventually!)
Scalability
You can certainly use more than one server to scale up if required, though we haven't had to do so, and from the load size you mention I don't think you would need to either.
I don't think I have really managed to answer your question, as I haven't highlighted enough disadvantages of Service Broker queues. I would say the impenetrable nature of its internal workings is the thing that most annoys me - when it works, it works very well, but when it stops working it can be very hard to figure out why. Also, if you have a lot of messages in a queue, using ALTER QUEUE takes a very long time to complete.
Not knowing how you are using MSMQ also makes it different to fairly compare the two technologies.
1 Recreated in a gist as the original url is now "disabled" and the page isn't in the internet archive. Eventually found a copy here

Does memcached share across servers in google app engine?

On the memcached website it says that memcached is a distributed memory cache. It implies that it can run across multiple servers and maintain some sort of consistency. When I make a request in google app engine, there is a high probability that request in the same entity group will be serviced by the same server.
My question is, say there were two servers servicing my request, is the view of memcached from these two servers the same? That is, do things I put in memcached in one server reflected in the memcached instance for the other server, or are these two completely separate memcached instances (one for each server)?
Specifically, I want each server to actually run its own instance of memcached (no replication in other memcached instances). If it is the case that these two memcached instances update one another concerning changes made to them, is there a way to disable this?
I apologize if these questions are stupid, as I just started reading about it, but these are initial questions I have run into. Thanks.
App Engine does not really use memcached, but rather an API-compatible reimplementation (chiefly by the same guy, I believe -- in his "20% time";-).
Cached values may disappear at any time (via explicit expiration, a crash in one server, or due to memory scarcity in which case they're evicted in least-recently-used order, etc), but if they don't disappear they are consistent when viewed by different servers.
The memcached server chosen doesn't depend on the entity group that you're using (the entity group is a concept from the datastore, a different beast).
Each server runs its own instance of memcached, and each server will store a percentage of the objects that you store in memcache. The way it works is that when you use the memcached API to store something under a given key, a memcached server is chosen (based on the key).
There is no replication between memcached instances, if one of those boxes goes down, you lose 1/N of your memcached' data (N being the number of memcached instances running in AppEngine).
Typically, memcached does not share data between servers. The application server hashes the key to choose a memcached server, and then communicates with that server to get or set the data.
Based in what I know, there is only ONE instance of Memcache of you entire application, there could be many instance of your code running each one with their memory, and many datastore around the world, but there is only one Memcache server at a time, and keep in mind that this susceptible to failure service, even is no SLA for it.

Resources