Postgres VACUUM on replica - database

When my postgres has both master and replica nodes, will the replica node do the same after the master node does an auto vacuum.

Any data modification on the primary server (with the exception of hint bits, which are an optimization that does not influence the effective state of the database) is written to WAL, replicated and replayed on the standby server. That applies to VACUUM as well; consequently, an autovacuum run on the primary is also an autovacuum run on the standby.

Related

Opengauss+keepalived active/standby switchover, and the active/standby replication relationship is lost

Opengauss + keepalived active/standby switch, and the active/standby replication relationship is lost.
Use opengauss + keepalived to build a high availability environment for simple HA.
Process: After the failure of the primary simulation, the vip also drifts to the standby database. Check the status of the standby database, which has changed from standby to primary. Then, restart the master database, and the master database will preempt back to the vip, which will also drift to the master database. But before, through gs_ The master-slave replication relationship built by ctl build D/gaussdb/data/db1 - M standby is gone, so you need to manually rebuild the relationship.
Question:
After the primary database of opengauss is restored, does the previous active/standby replication relationship really disappear? It cannot be self created or automatically modified. Can you only manually re create the relationship?
Is there any solution to automatically modify or create a master-slave replication relationship after the failure recovery?
Keepalived.conf configuration file
Use the nopreempt parameter to set it to the non preemptive mode, so that after the master database recovers from the failure, the VIP will not be retrieved from the new master database. However, you need to set the master and backup states to backup.

My inserts in SQL Server are slow for few tables

I have an application which inserts data into tables. I found that for 3 tables the insert of data is really slow. The schema, no. Of records etc are same in stage and my prod environment. But, the slowness is observed in prod.
I checked that index fragmentation is low and stats are up-to-date.
Schema and constraints are the same in stage and prod. But inserts are performing well in stage, but slow in prod.
It's a client's prod so I don't have access to the machine. What should I be checking now? I am planning to observe IO/ disk is good. Any ideas on what queries I should be given to the client to get to the root cause of the slowness?
Any ideas on what queries I shd be given to client to get to root cause of the slowness.
Typical cause of "slow inserts in prod" is that the log volume has higher latency and you're not batching inserts into transactions. Without a transaction, each insert statement requires a physical log flush.
The Session Wait Stats should allow you (or the DBA) to pinpoint the cause of the slowness. Waits to flush the log file are tracked as WRITELOG waits. This can also be caused by having a remote synchronous replica using Mirroring or Availibility Groups, which have their own wait types.

Avoiding data duplication with Logical replication ( PostgreSQL 10)

I've configured two servers with redundancy setup using pcsd configuration.
Both machines consists of Postgres 10 and logical replication. Used below steps for logical replication setup.
Took PG Dump on Server1 using pg_dump command.
Restored it on Server2 with postgres 10 using pg_restore.
Made changes in pg_hba.conf and postgres.conf files.
Used below commands for setup of logical replication.
CREATE PUBLICATION my_publication FOR ALL TABLES;
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=Server1 port=5432 password=postgres user=postgres dbname=database1'
PUBLICATION my_publication WITH (copy_data = false);
Restarted both servers.
After above steps I could see services are running fine on both the systems(Redundant systems). But from the logs I could see below error messages.
...
2020-01-08 15:14:08.551 EET >LOG: logical replication apply worker for subscription "my_subscription" has started
2020-01-08 15:14:08.559 EET >ERROR: duplicate key value violates unique constraint "pk_xyz_instance"
2020-01-08 15:14:08.559 EET >DETAIL: Key (xyz_instance_id)=(103) already exists.
2020-01-08 15:14:08.560 EET >LOG: worker process: logical replication worker for subscription 23176 (PID 7411) exited with exit code 1
....
As I need earlier data of Server1, I took dump and restored it on other and using copy_data as false to avoid duplication.
After every switchover of services from Server1 to Server2 or vice versa, these unique constraint violation errors are seen on Server2 (where services are inactive state)
Is there anything I'm missing here in setup of replication using PostgreSQL 10.11?
Is copy_data flag not working as I expected?
With asynchronous replication, it can always happen that the standby is lagging at the point of failover and some transactions are lost. If you try to use the old primary server, which may be some transactions ahead, as new standby, the databases can be inconsistent and replication conflicts like you observe can happen.
One solution would be to use synchronous logical replication, but that reduces availability unless you have more than one standby server.
The best would be to use physical replication. Not only is it simpler and more performant, but you can also use pg_rewind to quickly turn an old primary server into a new standby server.

Database is read-only error in Secondary Replica of Alwayson Groups

I tried many things and analyzed lots of documents but I haven't found a solution yet.
I have three virtual machines in VmWare called (DC,SQLServer01,SQLServer02). All of SQL Servers are member of a domain.(DC) I installed failover cluster for SQLServer01 and SQLServer02. I did necessary configurations in SQLServer01. Then I installed SQL Server 2014 for both servers. Now, I created an alwaysOn group. SQLServer01 is a primary and other is secondary. When I cut the connection of SQLServer01, everything is fine (Secondary becomes primary). It is acceptable for other condition.
However, when all servers are online, I can not do any operation (insert,update,delete,alter ,etc) except read operations in my secondary replica. I see always "database is read only" error. In properties of Alwayson group, both primary and secondary replica have all connections and secondary readable is "YES".
I want to make CRUD operations even if all servers are online. (I mean, do everything also for secondary replica. )
So, do you have any suggestion or idea?
Thank your time and consideration.
The error occurs because writing to secondary replicas in sql server is not possible. Only primary replica can host read-write databases, and an availability group can only have one single primary replica. Secondary replicas can host read-only databases only. When both replicas are available, only one of the two can be the primary and therefore support read-write. When only a single replica is available, that replica becomes primary replica because there are no other replicas, and read-write operations against that replica is possible.
What you need to configure instead is replication.
In SQL Server, merge replication allows you to write at multiple nodes, with periodic synchronization that resolves conflicts and pushes changes to all replicas.
Peer to Peer replication is another solution. Application layer must not allow conflicts (update of same row at more than one node), but is much faster.

Difference between Stream Replication and logical replication

Could anybody tell me more about difference between physical replication and logical replication in PostgreSQL?
TL;DR: Logical replication sends row-by-row changes, physical replication sends disk block changes. Logical replication is better for some tasks, physical replication for others.
Note that in PostgreSQL 12 (current at time of update) logical replication is stable and reliable, but quite limited. Use physical replication if you are asking this question.
Streaming replication can be logical replication. It's all a bit complicated.
WAL-shipping vs streaming
There are two main ways to send data from master to replica in PostgreSQL:
WAL-shipping or continuous archiving, where individual write-ahead-log files are copied from pg_xlog by the archive_command running on the master to some other location. A restore_command configured in the replica's recovery.conf runs on the replica to fetch the archives so the replica can replay the WAL.
This is what's used for point-in-time replication (PITR), which is used as a method of continuous backup.
No direct network connection is required to the master server. Replication can have long delays, especially without an archive_timeout set. WAL shipping cannot be used for synchronous replication.
streaming replication, where each change is sent to one or more replica servers directly over a TCP/IP connection as it happens. The replicas must have a direct network connection the master configured in their recovery.conf's primary_conninfo option.
Streaming replication has little or no delay so long as the replica is fast enough to keep up. It can be used for synchronous replication. You cannot use streaming replication for PITR1 so it's not much use for continuous backup. If you drop a table on the master, oops, it's dropped on the replicas too.
Thus, the two methods have different purposes. However, both of them transport physical WAL archives from primary to replica; they differ only in the timing, and whether the WAL segments get archived somewhere else along the way.
You can and usually should combine the two methods, using streaming replication usually, but with archive_command enabled. Then on the replica, set a restore_command to allow the replica to fall back to restore from WAL archives if there are direct connectivity issues between primary and replica.
Asynchronous vs synchronous streaming
On top of that, there's synchronous and asynchronous streaming replication:
In asynchronous streaming replication the replica(s) are allowed to fall behind the master in time when the master is faster/busier. If the master crashes you might lose data that wasn't replicated yet.
If the asynchronous replica falls too far behind the master, the master might throw away information the replica needs if max_wal_size (was previously called wal_keep_segments) is too low and no slot is used, meaning you have to re-create the replica from scratch. Or the master's pg_wal(waspg_xlog) might fill up and stop the master from working until disk space is freed if max_wal_size is too high or a slot is used.
In synchronous replication the master doesn't finish committing until a replica has confirmed it received the transaction2. You never lose data if the master crashes and you have to fail over to a replica. The master will never throw away data the replica needs or fill up its xlog and run out of disk space because of replica delays. In exchange it can cause the master to slow down or even stop working if replicas have problems, and it always has some performance impact on the master due to network latency.
When there are multiple replicas, only one is synchronous at a time. See synchronous_standby_names.
You can't have synchronous log shipping.
You can actually combine log shipping and asynchronous replication to protect against having to recreate a replica if it falls too far behind, without risking affecting the master. This is an ideal configuration for many deployments, combined with monitoring how far the replica is behind the master to ensure it's within acceptable disaster recovery limits.
Logical vs physical
On top of that we have logical vs physical streaming replication, as introduced in PostgreSQL 9.4:
In physical streaming replication changes are sent at nearly disk block level, like "at offset 14 of disk page 18 of relation 12311, wrote tuple with hex value 0x2342beef1222....".
Physical replication sends everything: the contents of every database in the PostgreSQL install, all tables in every database. It sends index entries, it sends the whole new table data when you VACUUM FULL, it sends data for transactions that rolled back, etc. So it generates a lot of "noise" and sends a lot of excess data. It also requires the replica to be completely identical, so you cannot do anything that'd require a transaction, like creating temp or unlogged tables. Querying the replica delays replication, so long queries on the replica need to be cancelled.
In exchange, it's simple and efficient to apply the changes on the replica, and the replica is reliably exactly the same as the master. DDL is replicated transparently, just like everything else, so it requires no special handling. It can also stream big transactions as they happen, so there is little delay between commit on the master and commit on the replica even for big changes.
Physical replication is mature, well tested, and widely adopted.
logical streaming replication, new in 9.4, sends changes at a higher level, and much more selectively.
It replicates only one database at a time. It sends only row changes and only for committed transactions, and it doesn't have to send vacuum data, index changes, etc. It can selectively send data only for some tables within a database. This makes logical replication much more bandwidth-efficient.
Operating at a higher level also means that you can do transactions on the replica databases. You can create temporary and unlogged tables. Even normal tables, if you want. You can use foreign data wrappers, views, create functions, whatever you like. There's no need to cancel queries if they run too long either.
Logical replication can also be used to build multi-master replication in PostgreSQL, which is not possible using physical replication.
In exchange, though, it can't (currently) stream big transactions as they happen. It has to wait until they commit. So there can be a long delay between a big transaction committing on the master and being applied to the replica.
It replays transactions strictly in commit order, so small fast transactions can get stuck behind a big transaction and be delayed quite a while.
DDL isn't handled automatically. You have to keep the table definitions in sync between master and replica yourself, or the application using logical replication has to have its own facilities to do this. It can be complicated to get this right.
The apply process its self is more complicated than "write some bytes where I'm told to" as well. It also takes more resources on the replica than physical replication does.
Current logical replication implementations are not mature or widely adopted, or particularly easy to use.
Too many options, tell me what to do
Phew. Complicated, huh? And I haven't even got into the details of delayed replication, slots, max_wal_size, timelines, how promotion works, Postgres-XL, BDR and multimaster, etc.
So what should you do?
There's no single right answer. Otherwise PostgreSQL would only support that one way. But there are a few common use cases:
For backup and disaster recovery use pgbarman to make base backups and retain WAL for you, providing easy to manage continuous backup. You should still take periodic pg_dump backups as extra insurance.
For high availability with zero data loss risk use streaming synchronous replication.
For high availability with low data loss risk and better performance you should use asynchronous streaming replication. Either have WAL archiving enabled for fallback or use a replication slot. Monitor how far the replica is behind the master using external tools like Icinga.
References
continuous archiving and PITR
high availability, load balancing and replication
replication settings
recovery.conf
pgbarman
repmgr
wiki: replication, clustering and connection pooling

Resources