etcd DB cluster on kubernetes misbehaving - database

In my project we have etcd DB deployed on Kubernetes (this etcd is for application use, separate from the Kubernetes etcd) on on-prem. So I deployed it using the bitnami helm chart as a statefulset. Initially, at the time of deployment, the number of replicas was 1 as we wanted a single instance of etcd DB earlier.
The real problem started when we scaled it up to 3. I updated configuration to scale it up by updating the ETCD_INITIAL_CLUSTER with two new members DNS name:
etcd-0=http://etcd-0.etcd-headless.wallet.svc.cluster.local:2380,etcd-1=http://etcd-1.etcd-headless.wallet.svc.cluster.local:2380,etcd-2=http://etcd-2.etcd-headless.wallet.svc.cluster.local:2380
Now when I go inside any of etcd pod and run etcdctl member list I only get a list of member and none of them is selected as leader, which is wrong. One among three should be the leader.
Also after running for some time these pods start giving heartbeat exceeds error and server overload error:
W | etcdserver: failed to send out heartbeat on time (exceeded the 950ms timeout for 593.648512ms, to a9b7b8c4e027337a
W | etcdserver: server is likely overloaded
W | wal: sync duration of 2.575790761s, expected less than 1s
I changed the heartbeat default value accordingly, the number of errors decreased but still, I get a few heartbeat exceed errors along with others.
Not sure what is the problem here, is it the i/o that's causing the problem? If yes I am not sure how to be sure.
Will really appreciate any help on this.

I don't think 🤔 the heartbeats are the main problem, it also seems 👀 the logs that you are seeing are Warning logs. So it's possible that some heartbeats are missed here and there but your nodes are node(s) are not crashing or mirroring.
It's likely that you changed the replica numbers and your new replicas are not joining the cluster. So, I would recommend following this guide for you to add the new members to the cluster. Basically with etcdctl something like this:
etcdctl member add node2 --peer-urls=http://node1:2380
etcdctl member add node3 --peer-urls=http://node1:2380,http://node2:2380
Note that you will have to run these commands in a pod that has access to all your etcd nodes in your cluster.
You could also consider managing your etcd cluster with the etcd operator 🔧 which should be able to take care of the scaling and removal/addition of nodes.
✌️

Okay, I had two problems:
"failed to send out heartbeat" Warning messages.
"No leader election".
Next day i found out the reason of second problem, actually i had startup parameter set in the pod definition.
ETCDCTL_API: 3
so when i run "etcdctl member list" with APIv3 it doesn't mention which member is selected as reader.
$ ETCDCTL_API=3 etcdctl member list
3d0bc1a46f81ecd9, started, etcd-2, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2379, false
b6a5d762d566708b, started, etcd-1, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2379, false
$ ETCDCTL_API=2 etcdctl member list
3d0bc1a46f81ecd9, started, etcd-2, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-2.etcd-headless.wallet.svc.cluster.local:2379, false
b6a5d762d566708b, started, etcd-1, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2380, http://etcd-1.etcd-headless.wallet.svc.cluster.local:2379, true
So when i use APIv2 i can see which node is elected as leader and there were no problem with leader election. Still working on heartbeat warning but i guess i need to tune the config in order to avoied that.
NB: I have 3 nodes, stopped one for testing.

Related

PVC increase | Database mirroring

I am increasing the persistent volume of our INT database from 30Gi to 40Gi upon releasing helm changes I am getting below error:
Error: UPGRADE FAILED: cannot patch "dbname-dbname-db" with kind StatefulSet: StatefulSet.apps "dbname-dbname-db" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden
I did a bit of research and I am figuring out that I need to stop the pod connecting to the database and/or delete the current service name dbname-dbname-db to apply the changes.
This seems to be pausing risks of downtime in our environment.
Can you please suggest possible ways of updating PVC capacity without downtime in the environment?
I read about database mirroring but it's not clear to me how to implement this. we run our application in a Kubernetes cluster I would appreciate it if one could show me the kubectl commands to create database mirror
below is part of the values.YAML.gotmpl file for the helm.
volumeMounts: {{- toYaml .volumeMounts | nindent 4 }}
volumes: {{- toYaml .volumes | nindent 4 }}
postgresql:
postgresqlUsername: "user1"
postgresqlDatabase: {{ .database.name | quote }}
existingSecret: {{ .database.existingSecret | quote }}
persistence:
enabled: true
size: 30Gi ----> value being changed to **40Gi**
nameOverride: "dbname-db"
metrics:
enabled: true
networkPolicy:
enabled: false
allowExternal: true
image:
replicaCount: {{ .image.replicaCount }}
repository: itops.company.io
imageName: company/app-backend
tag: {{ .image.tag | quote }}
Let's divide your issue into two different issues:
Increase your volume size
Update the value in your values.YAML.gotmpl file to reflect the actual size.
According to the error you get, it is apparently that the size in the values file is templated to the statefulset configuration under the field volumeClaimTemplates. However, currently this field cannot be patched (currently (April 2022) there is an open issue in kubernetes about it).
The way to resize the PVC is by patching the PVC itself.
Kubernetes can let you increase the PVC size without the need to restart the pod that uses it. This depends on what version of kubernetes you have, what type of storage and what flags are set by the cluster administrator.
You can try the following, supposing you have a kubectl access to the namespace of your database:
Figure out the number of pods in your statefulset: kubectl get sts dbname-dbname-db
Figure out the PVC names by runing kubectl describe statefulset dbname-dbname-db. You should see a section Volume Claims. the name field in this section is the prefix for the PVC name. the actual PVC name should be <prefix>-dbname-dbname-db-<i> when i is the serial number of the pod (out of the number of the pods for this statefulset), starting with 0.
For each PVC, run kubectl patch pvc <pvc-name> -p '{"spec":{"resources":{"requests":{"storage":"40Gi"}}}}}'
If the previous commands are successful, you can track the status of the resizing process by running kubectl describe pvc <pvc-name> and looking in the conditions field. It can take several minutes for the process to complete, make sure it is completed before proceeding with the next steps.
The above process leaves you with the second problem - the code in your values file is not aligned with the actual volume size.
The solution for this problem is as follows, and must be done after the previous process is completed :
Make sure you are ready to deploy your changes to the values file (30Gi -> 40Gi).
Delete the statefulset object, without deleting any running pod, by running kubectl delete statefulset dbname-dbname-db --cascade=orphan. this way, the working pods continue to work without interruption.
Deploy your changes. The statefulset will be created again and should be automatically associated with the running pod(s).
Reference:
Expanding persistent volumes claims
Thanks, #Meir I followed your steps but encounter an error could not upgrade PVC it's attached to the node, you have to detach/stop the node.
After some research, we did the below steps and the PVC increase was successful, we had to delete the pod associated with the PVC thus in our case we destroyed the entire release via Helm.
Destroy release
apply this command:: kubectl patch pvc <pvc-name> -p '{\"spec\":{\"resources\":{\"requests\":{\"storage\":\"100Gi\"}}}}'
After applying the patch the value may not reflect as an update, you will see a message waiting for the pod to restart in this case, we deployed the release again as this will create the pods and run kubectl get pvc -n namespace the change should reflect.
May not be the best solution as there is a downtime for at least 10 minutes but worked for us.

Cassandra nodes out of sync - NTP Out Of Sync Issue

We have a cassandra cluster of 4 nodes, and it was working perfectly. After 2 of the nodes got restarted (since they were lxcs on the same machine), those 2 nodes are not able to join the cluster and fail with the error :
ERROR [MigrationStage:1] 2014-07-06 20:34:36,994 MigrationTask.java (line 55) Can't send migration
request: node /X.X.X.93 is down.
Two of the nodes (not restarted), are showing them DN in the nodetool status, while the others (ones which got restarted), are showing the others as UN.
I've checked the gossipinfo and that is fine.
Can anybody help me on this?
I suppose you have cross_node_timeout = true and time between your servers is not in sync. You might want to check your ntp settings.
The new nodes might be dropping the requests for data that they are getting from the older nodes. Hence the ntp should be configured on all the nodes of cassandra.

Nagios conditional checks

Currently I am monitoring my target windows hosts for a bunch of services (CPU, memory, disks, ssl certs, http etc). I'm using nsclient as the client that the nagios server will talk to.
My problem is that I deploy to those hosts three times every 24 hours. The deployment process requires the hosts to reboot. Whenever my hosts reboot I get nagios alerts for each service. This means a large volume of alerts, which makes it difficult to identify real issues.
Ideally I'd like to this:
If the host is down, don't send any alerts for the rest of the services
If the host is rebooting, this means that nsclient is not accessible. I want to only receive one alert (e.g CPU is not accessible) and mute everything else for a few minutes, so the host can finish booting and nsclient becomes available.
Implementing this would have me getting one email per host for each deployment. This is much better than everything turning red and me getting flooded with alerts that aren't worth checking (since they're only getting sent because the nagios client -nsclient- is not available during the reboot).
Got to love using a windows stack...
There are several ways to handle this.
If your deploys happens at the same time everyday:
1. you could modify your active time period to exclude those times (or)
2. schedule down time for your host via the Nagios GUI
If your deployments happen at different/random times, things become a bit harder to work-around:
1. when nrpe or nsclient is not reachable, Nagios will often throw an 'UNKNOWN' alert for the check. If you remove the 'u' option for the following entries:
host_notification_options [d,u,r,f,s,n]
service_notification_options [w,u,c,r,f,s,n]
That would prevent the 'UNKNOWN's from sending notifications. (or)
2. dynamically modify active checking of the impacted checks, by 'turning them off' before you start the deployment, and then 'turning them on' after the deployment. This can be automated using the Nagios 'external commands file'.
Jim Black's answer would work, or if you want to go even more in depth you can define dependencies with service notification escalation as described in the documentation below.
Escalating the alerts would mean that you could define: CPU/ssl etc check fail -> check host down -> Notifiy/don't notify.
Nagios Service Escalation (3.0)

Shutdown EC2 Instance if idle right before another billable hour

At unpredictable times (user request) I need to run a memory-intensive job. For this I get a spot or on-demand instance and mark it with a tag as non_idle. When the job is done (which may take hours), I give it the tag idle. Due to the hourly billing model of AWS, I want to keep that instance alive until another billable hour is incurred in case another job comes in. If a job comes in, the instance should be reused and marked it as non_idle. If no job comes in during that time, the instance should terminate.
Does AWS offer a ready solution for this? As far as I know, CloudWatch can't set alarms that should run at a specific time, never mind using the CPUUtilization or the instance's tags. Otherwise, perhaps I could simply set up for every created instance a java timer or scala actor that runs every hour after the instance is created and check for the tag idle.
There is no readily available AWS solution for this fine grained optimization, but you can use the existing building blocks to build you own based on the launch time of the current instance indeed (see Dmitriy Samovskiy's smart solution for deducing How Long Ago Was This EC2 Instance Started?).
Playing 'Chicken'
Shlomo Swidler has explored this optimization in his article Play “Chicken” with Spot Instances, albeit with a slightly different motivation in the context of Amazon EC2 Spot Instances:
AWS Spot Instances have an interesting economic characteristic that
make it possible to game the system a little. Like all EC2 instances,
when you initiate termination of a Spot Instance then you incur a
charge for the entire hour, even if you’ve used less than a full hour.
But, when AWS terminates the instance due to the spot price exceeding
the bid price, you do not pay for the current hour.
The mechanics are the same of course, so you might be able to simply reuse the script he assembled, i.e. execute this script instead of or in addition to tagging the instance as idle:
#! /bin/bash
t=/tmp/ec2.running.seconds.$$
if wget -q -O $t http://169.254.169.254/latest/meta-data/local-ipv4 ; then
# add 60 seconds artificially as a safety margin
let runningSecs=$(( `date +%s` - `date -r $t +%s` ))+60
rm -f $t
let runningSecsThisHour=$runningSecs%3600
let runningMinsThisHour=$runningSecsThisHour/60
let leftMins=60-$runningMinsThisHour
# start shutdown one minute earlier than actually required
let shutdownDelayMins=$leftMins-1
if [[ $shutdownDelayMins > 1 && $shutdownDelayMins < 60 ]]; then
echo "Shutting down in $shutdownDelayMins mins."
# TODO: Notify off-instance listener that the game of chicken has begun
sudo shutdown -h +$shutdownDelayMins
else
echo "Shutting down now."
sudo shutdown -h now
fi
exit 0
fi
echo "Failed to determine remaining minutes in this billable hour. Terminating now."
sudo shutdown -h now
exit 1
Once a job comes in you could then cancel the scheduled termination instead of or in addition to tagging the instance with non_idle as follows:
sudo shutdown -c
This is also the the 'red button' emergency command during testing/operation, see e.g. Shlomo's warning:
Make sure you really understand what this script does before you use
it. If you mistakenly schedule an instance to be shut down you can
cancel it with this command, run on the instance: sudo shutdown -c
Adding CloudWatch to the game
You could take Shlomo's self contained approach even further by integrating with Amazon CloudWatch, which recently added an option to Use Amazon CloudWatch to Detect and Shut Down Unused Amazon EC2 Instances, see the introductory blog post Amazon CloudWatch - Alarm Actions for details:
Today we are giving you the ability to stop or terminate your EC2
instances when a CloudWatch alarm is triggered. You can use this as a
failsafe (detect an abnormal condition and then act) or as part of
your application's processing logic (await an expected condition and
then act). [emphasis mine]
Your use case is listed in section Application Integration specifically:
You can also create CloudWatch alarms based on Custom Metrics that you
observe on an instance-by-instance basis. You could, for example,
measure calls to your own web service APIs, page requests, or message
postings per minute, and respond as desired.
So you could leverage this new functionality by Publishing Custom Metrics to CloudWatch to indicate whether an instance should terminate (is idle) based on and Dmitriy's launch time detection and reset the metric again once a job comes in and an instance should keep running (is non_idle) - like so EC2 would take care of the termination, 2 out of 3 automation steps would have been moved from the instance into the operations environment and management and visibility of the automation process improved accordingly.

Drupal website blocked because of many connection errors - website goes offline

From time to time, the number of database connections from our Drupal 6.20 system to our Mysql database reaches 100-150 and after a while the website goes offline. The error message when trying to connect to Mysql manually is "blocked because of many connection errors. Unblock with 'mysqladmin flush-hosts'". Since the database is hosted on an Amazon RDS I don't have the permission to issue this command, but I can reboot the database and once rebooted the website works normally again. Until next time.
Drupal reports multiple errors prior to going offline, of two types:
Duplicate entry
'279890-0-all' for key
'PRIMARY' query:
node_access_write_grants /* Guest :
node_access_write_grants */ INSERT
INTO node_access (nid, realm, gid,
grant_view, grant_update,
grant_delete) VALUES (279890,
'all', 0, 1, 0, 0) in
/var/www/quadplex/drupal-6.20/modules/node/node.module
on line 2267.
Lock wait timeout exceeded; try
restarting transaction query:
content_write_record /* Guest :
content_write_record */ UPDATE
content_field_rating SET vid = 503621,
nid = 503621, field_rating_value =
1212 WHERE vid = 503621 in
/var/www/quadplex/drupal-6.20/sites/all/modules/cck/content.module
on line 1213.
The nids in these two queries are always the same and refer to two nodes that are frequently automatically updated by a custom module. I can track down a correlation between these errors and unusually many web requests in the Apache logs. I would understand that the website would become slower because of this. But:
Why do these errors occur, and how can they be solved? It seems to me it's to do with several web requests trying to update the same node at the same time. But surely Drupal should deal with this by locking the tables etc? Or should I deal with it in some special way?
Despite the higher web load, why does the database completely lock and require to be rebooted? Wouldn't it be better if the website still had access to Mysql and so, once the load is lower, it can serve pages again? Is there some setting for this?
Thank you!
Can be solved one or all of these three things to check:
are you out of disk space? From ssh, type command df -h and make sure you still have disk space.
Are the tables damaged? Repair the tables in phpMyAdmin, or CLI instructions here: http://dev.mysql.com/doc/refman/5.1/en/repair-table.html
Have you performance-tuned your mysql with an /etc/my.cnf? See this for more ideas: http://drupal.org/node/51263

Resources