I am running mariadb 5.5.2 on two Centos 7.1.1503 bare metal Dell servers. The servers are each 16 months old. They were never rebooted until July 2017. Call the first server salt01 call the second one salt02. salt02 was rebooted first, salt01 was rebooted next.
Since then, have noticed that the db on salt02 is missing entries we see on salt01. Those records co-incide with the reboot; that is, data is missing since then but previous data is present on salt02.
iptables is not running on these two servers.
This appears to be replication issue.
We have two ways to fix:
Follow a re-sync procedure that goes like this:
At the master:
RESET MASTER;
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
mysqldump -u root -p --all-databases > /a/path/mysqldump.sql
UNLOCK TABLES;
and on slave:
STOP SLAVE;
mysql -uroot -p < mysqldump.sql
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=valuefromshowmasterstatus;
START SLAVE;
Fix replication
Notice this in file /etc/my.cnf
bind-address = 127.0.0.1
on salt02 which is believed to be slave. How critical is this? Could point bind-address to master salt01 and restart mariadb on salt02
Was wondering about thoughts on which way to go here. I'm not a dba. Many thanks for your thoughts? Any questions feel free.
Related
I'm trying postgres database backup and restore. The access to postgres is through pgpool.
To achieve the back up I am following the official postgres pg_dumpall documentation.
Commands taken from postgres website: https://www.postgresql.org/docs/14/app-pg-dumpall.html
$ pg_dumpall > db.out
To reload database(s) from this file, you can use:
$ psql -f db.out postgres
The backup works fine.
However, when attempting to restore, I'm getting the following error because of replication feature enabled by pgpool.
psql:tmp/backup/postgresDump/pg_data.out:15: ERROR: database "xyz" is being accessed by other users
DETAIL: There is 1 other session using the database.
Here are the following ideas I tried by browsing other SO questions.
I tried to update active='f' in pg_catalog.pg_replication_slots view. It failed with the error below
DETAIL: Views that do not select from a single table or view are not automatically updatable.
HINT: To enable updating the view, provide an INSTEAD OF UPDATE trigger or an unconditional ON UPDATE DO INSTEAD rule.
List the process ids for the replication slot and used pg_terminate_backend along with the pid, followed by restore.
Command to terminate the replication slot
psql -U postgres -h pgpool.default.cluster.local -c "SELECT pg_terminate_backend(3402)"
pg_terminate_backend
----------------------
f
(1 row)
As per the second answer in this link, Postgresql - unable to drop database because of some auto connections to DB
I executed the terminate_backend command multiple times until it returned 0 results. Although this step was successful, restore failed with error saying
psql:tmp/postgresDump/pg_data.out:14: ERROR: database "xyz" is being accessed by other users
DETAIL: There is 1 other session using the database.
Looks like as soon as I drop a replication slot, the pgpool recreates the replication slot and establishes a connection for it.
Tried by dropping the replication_slot following the official documentation.
https://www.postgresql.org/docs/9.5/functions-admin.html
psql -U postgres -h pgpool.default.local -c "select pg_drop_replication_slot('repmgr_slot_1001');"
ERROR: replication slot "repmgr_slot_1001" is active for PID 3402
Any information on how to execute restore functionality through psql is highly appreciated.
Requirement in detail:
So I have two databases (both are in sync) and somehow one goes down and Spring Boot application starts giving exceptions. In this case I want the application to connect to the second database.
Please help me with this.
Thanks in advance.
As you have a DataGuard implementation in Oracle with a Primary database and another one in Standby mode, Oracle Transparent Application Failover is the way to go.
Transparent Application Failover (TAF) is a feature of the Java
Database Connectivity (JDBC) Oracle Call Interface (OCI) driver. It
enables the application to automatically reconnect to a database, if
the database instance to which the connection is made fails. In this
case, the active transactions roll back.
Database Setup
I am assuming your implementation of DG uses Oracle Restart.
Datatase: TESTDB
Service in TAF: TESTDB_HA
Primary site
srvctl add service -d testdb -s testdb_ha -l PRIMARY -y AUTOMATIC -e select -m BASIC -z 200 -w 1
srvctl start service -d testdb -s testdbha
Standby site
srvctl add service -d testdb -s testdb_ha-l PRIMARY -y AUTOMATIC -e select -m BASIC -z 200 -w 1
srvctl modify service -d testdb -s testdb_ha -failovermethod basic
Your JDBC connection
jdbc:oracle:thin:#(description=(address=(host=primaryserver)(protocol=tcp)(port=yourdbport))(address=(host=standbyserver)(protocol=tcp)(port=yourport))(failover=yes)(connect_data=(service_name=testdb_ha)(failover_mode=(type=select)(method=basic))))
In this setup, in case a failover from Primary to Standby, the connection will keep working once the failover is completed without manual intervention.
I am using this configuration currently in applications store in Kubernetes, using Spring Boot and/or Hibernate, and in normal Jboss Java applications. I have personally tested failover scenarios totally transparent for the applications. Obviously, if you have a transaction or query running in the moment the failover is being performed, you will get an error. But you don't need to manually change any jdbc settings in case of switch from primary site to standby site.
I have a DB in postgres. The DB is big with total size over 4TB and over 500,000 tables and many indexes. The DB is over 4 yr old.
Recently, the Pgsql DB server was not starting up, so I did the following to get it started again:
/usr/pgsql-9.3/bin/pg_resetxlog -f /var/lib/pgsql/9.3/data
/usr/pgsql-9.3/bin/pg_ctl -D /var/lib/pgsql/9.3/data stop
/usr/pgsql-9.3/bin/pg_ctl -D /var/lib/pgsql/9.3/data start
/usr/pgsql-9.3/bin/pg_ctl -D /var/lib/pgsql/9.3/data stop
systemctl restart postgresql-9.3
Since then I am getting the following error whenever I try to create a new table in the DB:
mps_schools=> create table test_test(hello int);
ERROR: right sibling's left-link doesn't match: block 19 links to 346956 instead of expected 346955 in index "pg_depend_reference_index"
I have tried re-indexing the DB, but it doesnt work. What more can I do?
pg_resetxlog destroyed your database, which is something that can easily happen, which is why you don't call it just because you don't get the database started. It's something of a last ditch effort to get a corrupted database up.
What can you do?
Best solution: restore from a backup from before you ran pg_resetxlog.
Perform an offline backup of your database.
Then start the database in single user mode:
postgres --single -P -D /your/database/directory yourdbname
Then try to reindex pg_depend:
REINDEX TABLE pg_catalog.pg_depend;
Exit the single user session, restart the database, run pg_dumpall to dump the database (and hope that it works), create a new database cluster with initdb and import the dump.
Don't continue using the cluster where you ran pg_resetxlog.
I want to replicate postgresql data of windows server to linux server, I know how to replication between same operating systems but that method is not working with windows and linux. If yes what would be the better way to do this?
You cannot use streaming replication between different operating systems.
Look at the PostgreSQL Wiki for a list of replication solutions. Some of them should work for you.
From PostgreSQL v10 on, you could consider logical replication.
Done this using Postgresql 9.5.21 as master on Windows 2012 R2 and slave on Ubuntu 14.04.
You have to take care about a few things:
most similar CPU (page size, architecture, registers). So, you can't mix 64/32bit, or using CPU with different endianess or page size;
same endianess also for the O.S.: both 32 or 64 bit;
same major version of PG: 9.5.x with same or other 9.5.x version (that's for streaming replication, that I'm using, Logical Replication works with different versions of PG);
So, I find an already installed PG on Windows Server. Edit postgresql.conf to enable replica and PITR, and pg_hba.conf to allow connection.
Then moved on Ubuntu and, with PG stopped, I fetched from the master with:
pg_basebackup -D /tmp/db/ -X stream -R -U postgres -h ip-master
Then modified configuration and replaced data directory with /tmp/db.
Start slave, and is up and running, but look at this:
2020-03-18 21:05:31.598 CET [44640] LOG: database system is ready to accept read
only connections
2020-03-18 21:05:31.631 CET [44645] LOG: started streaming WAL from primary at 36/C2000000 on timeline 1
2020-03-18 21:05:31.905 CET [44646] [unknown]#[unknown] LOG: incomplete startup packet
2020-03-18 21:05:32.416 CET [44649] postgres#postgres FATAL: database locale is incompatible with operating system
2020-03-18 21:05:32.416 CET [44649] postgres#postgres DETAIL: The database was initialized with LC_COLLATE "Italian_Italy.1252", which is not recognized by setlocale().
2020-03-18 21:05:32.416 CET [44649] postgres#postgres HINT: Recreate the database with another locale or install the missing locale.
Here's the funny thing: replication works, but you can't connect to the databases.
Anyway, if you raw copy the data dir on Windows, it works like a charm.
Of course, if you re-create the cluster with UTF-8, there's no problem at all.
n.b.: thanks a lot to incognito and ilmari on official IRC PG channel for the hints.
I am stuck with an issue with clearcase multisite syncing of a particular vob. I have already tried multitool chepoch on the master replica (say, M) to retrieve epoch table entries from remote replica (say, R). e.g. mt chepoch -actual R#vob-path. I understand that after this the master should start exporting packets as per the epoch table of remote (or something like that).
I have also tried using recoverpacket from master. e.g. mt recoverpacket -since <\last successful import date from lshistory of the vob on R> R#vob-path . This too, as I understand, is another way to 'sync' the epoch tables from remote to master by specifying a date.
All solutions on the internet including IBM's support website point to the same solutions that I just mentioned. The general idea is: get the epoch table on Master to match the Remote and let clearcase do the rest.
The problem is that the vob on the remote replica is WAYYY behind the master. So master keeps exporting packets and remote keeps storing them in the incoming bay, where they accumulate to several Gb. The scheduled sync_receive job fails to import these packets saying "packet depends on changes not yet received". But these changes never actually arrive from the master.
I have started to suspect that the master is not sending packets older than some point, which is why incoming bay of my remote keeps storing the 'newer' ones.
Is there anything else that I can try here?
Help is MUCH appreciated!
Thanks
Aashish.
If the destination vob (remote replica) is somehow corrupted, one workaround would be to re-create said remote replica Vob, following a process similar to "How to move a VOB using ClearCase MultiSite", in order to re-export the all vob to a brand new replica.
That would involve a cleartool mkreplica -export/import.
Usually, the command to type in your site is:
multitool chepoch -actual replica:{remote-replica}#{vob-tag}
Followed by a "multitool syncreplica -export" command
I recommend something like
multitool syncreplica -export -max 500m -out {packet-name} {replica-name}#{vob-tag}
I recommend to use the -max option in case there are too many data to replicate. This will avoid the creation of a 5Gb packet for example.
The -out option is useful as well because the packet will be generated but not shipped to the remote ship. By doing so, you can check if the sync paquet is actually created or not. If the packet is created, you can then transfer it to the remote site using the mkorder command
The main reason why a packet would not be generated is because the oplog has been scrubbed.
By default, I remember the oplogs older than 180 days (data to confirm) are scrubbed and are not kept forever. You should check the file /usr/atria/config/vob/vob_scrubber_params on your VOB server in order to check how long your oplogs are kept.
See IBM doc about scrubbing
If you do have a 3rd site, and if your server has been scrubbed, you can try to generate the packet from the 3rd site.
Last resort is indeed to recreate the replica, as suggested by VonC.