Postgres DB redundancy/HA server-side (no pgpool or similar)

Postgres DB redundancy/HA server-side (no pgpool or similar) - database

I start off with my question: how can I setup redundancy with several DBs without having to install anything special on each of the clients?
I have seen that the usual way to do it is to install something like pgpool on each client which can handle whether to write to the master or to pick a slave for the read. I would like to find a way to move the responsibility to the server, either by having a very small machine with just that dedicated function (effectively acting as Load Balancer) or something that can be setup on the DBs.
So the clients will only try to connect to http://IP_BALANCER and that machine will be able to revert all the operations accordingly, exactly as pgpool would do. Is that possible? If so, is that a good way to do it?
I am asking this question because the 'clients' would be a set of very light pods on a Kubernetes cluster. I managed to make the pods very minimal (they run with only compiled machine code) and I wouldn't want to have to add pgpool on top of them, since I would have to start adding an actual OS to be able to achieve that.

Related

Nagios: check multiple services simultaneously?

I've just started using Nagios to monitor a group of broadcast transmitters. Each transmitter is defined as a host, and each aspect of the transmitter I wish to monitor (RF forward, RF reflected, power supply voltages, etc) is defined as a service. In doing so, I can get an alarm if any of these aspects are out of tolerance, and can use the performance data to graph each aspect (using pnp4nagios, in this case).
To check the transmitters' telemetry data, I wrote some scripts, one to address the unique facilities of each make/model of transmitter involved. In keeping with the way I've seen other Nagios checks work, an argument to the script allows you to select which aspect you want reported.
At first I was content with this. It worked like any more-traditional use of Nagios I'd encountered. But then I hit a snag.
Because each service check is scheduled individually, diagnosing an alarm condition can be tricky, since the various services aren't all being checked at the same time - and therefore the set of values I'm looking at is unlikely to be time-aligned. If all the service check values were from the same moment in time, it would be easier to detect correlations (since the set of values would essentially be a snapshot).
My first thought would be to deal with this by running a single instance of a single command, which would return values for multiple services. This would also seem far more efficient than opening as many connection instances as there are services to be checked. From a scripting perspective, this is easily done. But from a Nagios config perspective, I don't know how (or if?) you'd do that.
I know I could also divorce the data collection from the Nagios check, caching the telemetry values all at once periodically, and feeding Nagios values from the cache. But I don't want to introduce added delays if I can help it.
Thoughts?

My first thought would be to deal with this by running a single instance of a single command, which would return values for multiple services. This would also seem far more efficient than opening as many connection instances as there are services to be checked. From a scripting perspective, this is easily done. But from a Nagios config perspective, I don't know how (or if?) you'd do that.
There's nothing strange about this from a Nagios perspective, because what you're essentially doing is writing your own plugin, and plugins can be as general or specific as you want them to be.
When writing your own plugin, it's good to remember:
Your script is responsible for all failures, so make sure you handle garbage responses, failed connections and whatever other errors you predict may happen in the plugin itself, and exit with appropriate error levels.
Since you may encounter errors you didn't expect, it probably makes sense to have the plugin write what it's doing to a log file, as well as what responses it got.
The plugin must use exit codes to alert Nagios correctly. If you want performance data, it needs to be given in the correct syntax. See the development guidelines.
I'm considering submitting the service data passively. It would solve all the problems I mentioned. But it would create a few minor new ones - now there's external processes to keep running, and it's a little outside the mainstream way of doing things (might put a future admin through a little pain to figure out how it works).
I don't think this is a better solution than writing your own plugin, unless the data is coming from nodes actively pushing it out.
For example, in an IoT context, the nodes you are monitoring may actually be sending passive check results directly to the Nagios instance. In that setting, passive checks make sense, because you just want to take whatever someone else gives you and action in case no results come in (freshness).
In your case, it sounds like writing your own script would take care of both the timing issue and whatever else additional logic you want in your script, and as far as Nagios is concerned it should only run it on a schedule and watch the exit codes, then act as configured if it fails.

How to transfer rules and configuration to edge devices?

In our application we have a server which contains entities along with their relations and processing rules stored in DB. To that server there will be n no.of clients like raspberry pi , gateways, android apps are connected.
I want to push configuration & processing rules to those clients, so when they read some data they can process on their own. This is to make the edge devices self sustainable, avoid outages when server/network is down.
How to push/pull the configuration. I don't want to maintain DBs at client and configure replication. But the problem is maintenance and patching of DBs for those no.of client will be tough.
So any other better alternative.?
At the same time I have to push logs to upstream (server).
Thanks in advance.

I have been there. You need an on-device data store. For this range of embedded Linux, in order of growing development complexity:
Variables: Fast to change and retrieve, makes sense if the data fits in memory. Lost if the process ends.
Filesystem: Requires no special libraries, just read/write access somewhere. Workable if the data is small enough to fit in memory and does not change much during execution (read on startup when lacking network, write on update from server). If your data can be structured as a few object variables, you could write them to JSON files, and there is plenty of documentation on other file storage options for Android apps.
In-memory datastore like Redis: Lightweight dependency, can automate messaging and filesystem-stored backup. Provides a managed framework/hybrid of the previous two.
Lightweight databases, especially SQLite: Lightweight SQL database, stored in one file and popular with Android apps (probably already installed on many of the target devices). It could work for frequent changes on a larger block of data in a memory-constrained environment, but does not look like a great fit. It gets worse for anything heavier.
Redis replication is easy, but indiscriminate, so mainly sensible if your devices receive a changing but identical ruleset. Otherwise, in all these cases, the easiest transfer option may be to request and receive the whole configuration (GET a string, download a JSON file, etc.) and parse the received values.

2 Node Redis HA

I have two nodes which I want to run as servers in active-active mode and also have HA capability i.e if one is down, the other one should start receiving all the requests but while both are up, both should be taking all the requests. Now since Redis doesn't allow active-active mode for the same hash set and I don't have option to run Sentinel as I can't have a third node, my idea is to run the two nodes in replication and myself decide if master node is down and promote the slave to master. Are there any issues with this? When the original master comes back up, is there a way to configure it as slave?
Does this sound like a good idea? I am open to suggestions other than Redis.

Generally running two node is never a good idea because it is bound to have split brain problem: When the network between the two node is down for a moment or two, the two node will inevitably think each other is offline and will promote/keep itself to be master and start accepting requests from other services. Then the split brain happens.
And if you are OK with this possible situation, then you can look into setup a master-slave with help of a script file and a HA service like pacemaker or keepalived.
Typically you have to tell the cluster manager through a predefined rule that when two machine rejoins under split brain condition, which one is your preferred master.
When a master is elected, execute the script and basically it execute slaveof no one on itself and execute slaveof <new-master-ip> <port> on the other node.
You could go one step further in your script file and try to merge the two data sets together but whether that's achievable or not is entirely down to how you have organized your data in Redis and how long you are prepared to wait for to have all the data in sync.
I have done this way myself before through pacemaker+corosync.

Ok, partial solution with SLAVEOF:
You can manually promote slave to master by running:
SLAVEOF NO ONE
You can manually transition master to slave by running:
SLAVEOF <HOST> <port>
Clustering should be disabled.

If you brought the replica online manually by changing it to replicaof no one, you need to be careful to bring the failed master back online as a replicaof the new node so you dont overwrite more recent data. I would not recommend doing this manually. You want to minimize downtime so automated failover is ideal
You mention being open to other products. Check out KeyDB which has the exact configuration you are looking for. It is a maintained multi-threaded fork of redis which offers the active-replica scenario you are looking for. Check out an example of it here.
Run both nodes as replicas of each other accepting reads and writes simultaneously (depending on upfront proxy config). If one fails the other continues to take the full load and is already sync'd.
Regarding the split brain concern, KeyDB can handle split brain scenarios where the connection between masters is severed, but writes continue to be made. Each write is timestamped and when the connection is restored each master will share their new data. The newest write will win. This prevents stale data from overwriting new data written after the connection was severed.

I would recommendation to have at least 3 nodes with Sentinel Setup for enabling gossip/quorum for auto promotion of slave to master when current master node goes down.

I believe it is possible to create a cluster with two nodes with the commands below:
$ redis-cli --cluster create <ip-node1>:7000 <ip-node1>:7001 <ip-node2>:7000 <ip-node2>:7001 --cluster-replicas 1
To resolve the split-brain problem. you can add a third node without data:
$ cluster meet #IP_node3#:7000
$ cluster nodes
I think it works.

Create SidekiqWorkers on a server by receiving information from another server

Sorry If my title didn't made any sense, but my problem is that, I have one application which is hosted on a server, the application uses a database which is hosted on the same server, also the same server is using sidekiq to process a lot of queues.
One problem, is that a lot of memory is used, and everything works very slow, and even if I have a 8 core processor, I can't take advantage of it when processing queues because the application was developed on MRI and is using Unicorn.
I was thinking at moving all the part which is used to process the queues on a different server, there install Puma, and jRuby and process the queues in there(this process should be a lot faster by taking advantage of multiple cores.
All the data processed by sidekiq, is coming from a Database and is stored in a Database(currently is the same database from where it takes the info and where is storing the data). Most of the sidekiq workers are receiving some information, and are using that information to get other informations so they need to connect to the same db as the app.
What will be a good solution, to serve the same database to 2 different applications?
And is it a good idea to have another server with Puma and jRuby installed for sidekiq only(maybe other things in the future)?
Thank you

Even with MRI and Unicorn you can take advantage of multiple cores: Just start unicorn multiple times or use the clustered mode provided by Puma. Same for Sidekiq. No need to switch to JRuby right away.
Accessing the database from multiple application is no problem. But do yourself a favor and use a dedicated database server. Makes added more application servers way easier.

App engine remote shell via a proxy when using Python

I am using the Google Appengine remote shell, with Python. I am walking through an entire database table updating all my entities, and I am doing this in 500 entity chunks. This is all working fine. The task involves
fire up the remote shell
kick off the job
wait 10 minutes
rinse, repeat
I'd like to keep this up while I'm at work, and just do it in the background, without of course impacting my productivity :-). What's getting in my way is the firewall, which prevents this sort of transfer of data, when logged in over VPN.
So is there a way to do this, like in a separate Emacs shell? If I had two computers, I'd just run this thing on my spare, but I don't. (I do have an iPad, but I doubt that helps).
I may be misunderstanding the core issues, and hence, my question.

Rather than using the remote shell, it'll probably be easier - and certainly quicker - to run the job entirely on the server via the mapreduce API.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight