Clearcase multisite sync issue - clearcase

I am stuck with an issue with clearcase multisite syncing of a particular vob. I have already tried multitool chepoch on the master replica (say, M) to retrieve epoch table entries from remote replica (say, R). e.g. mt chepoch -actual R#vob-path. I understand that after this the master should start exporting packets as per the epoch table of remote (or something like that).
I have also tried using recoverpacket from master. e.g. mt recoverpacket -since <\last successful import date from lshistory of the vob on R> R#vob-path . This too, as I understand, is another way to 'sync' the epoch tables from remote to master by specifying a date.
All solutions on the internet including IBM's support website point to the same solutions that I just mentioned. The general idea is: get the epoch table on Master to match the Remote and let clearcase do the rest.
The problem is that the vob on the remote replica is WAYYY behind the master. So master keeps exporting packets and remote keeps storing them in the incoming bay, where they accumulate to several Gb. The scheduled sync_receive job fails to import these packets saying "packet depends on changes not yet received". But these changes never actually arrive from the master.
I have started to suspect that the master is not sending packets older than some point, which is why incoming bay of my remote keeps storing the 'newer' ones.
Is there anything else that I can try here?
Help is MUCH appreciated!
Thanks
Aashish.

If the destination vob (remote replica) is somehow corrupted, one workaround would be to re-create said remote replica Vob, following a process similar to "How to move a VOB using ClearCase MultiSite", in order to re-export the all vob to a brand new replica.
That would involve a cleartool mkreplica -export/import.

Usually, the command to type in your site is:
multitool chepoch -actual replica:{remote-replica}#{vob-tag}
Followed by a "multitool syncreplica -export" command
I recommend something like
multitool syncreplica -export -max 500m -out {packet-name} {replica-name}#{vob-tag}
I recommend to use the -max option in case there are too many data to replicate. This will avoid the creation of a 5Gb packet for example.
The -out option is useful as well because the packet will be generated but not shipped to the remote ship. By doing so, you can check if the sync paquet is actually created or not. If the packet is created, you can then transfer it to the remote site using the mkorder command
The main reason why a packet would not be generated is because the oplog has been scrubbed.
By default, I remember the oplogs older than 180 days (data to confirm) are scrubbed and are not kept forever. You should check the file /usr/atria/config/vob/vob_scrubber_params on your VOB server in order to check how long your oplogs are kept.
See IBM doc about scrubbing
If you do have a 3rd site, and if your server has been scrubbed, you can try to generate the packet from the 3rd site.
Last resort is indeed to recreate the replica, as suggested by VonC.

Related

How can i get alarmed if the master's GTID differs from the slave?

The MaxScale distributes the requests to the MariaDB database -> master/slave server on which the database is located.
What i need is a script running as a cron or something similar which verifies the GTID from master and slaves. If the slaves GTID differs from the masters GTID i want to be informed/alarmed via email.
Unfortunately i have no idea if this is possible somehow and how to do it
You can enable gtid_strict_mode to automatically stop the replication if GTIDs from the same domain conflict with what is already in the binlogs. If you are using MaxScale, it will automatically detect this and stop using it.
Note that this will not prevent transactions from other GTID domains from causing problems with your data. This just means you'll have to pay some attention if you're using multi-domain replication.
If you want to be notified of this, you can use the script option in MaxScale to trigger a custom script to be launched whenever the server stops replicating.

MediaWiki installation issue - port problems

I am trying to install MediaWiki version 1.31 localy and I have run into some issues that I cant get past by. Mainly when I input datatabe connection (I am trying to connect to PostgreSQL database) information it returns this error.
Thing is the port I am trying to connect is 5433 not 5432, also the names "template1" and "postgres" are not included in my input trough the dialogue screen - I dont know where they came from. "test1" is the name of the database I am trying to connect to.
Any help or advice how to get trough this error would be greatly appreciated. Thank you.
That the port you specify is not used while setting up the database schema in the first place is a long-standing known bug. One workaround is to run your database on the default port until you have wiki set up, then change it back to the port you want.
In order to create a new database, you need to connect to an existing database in the same cluster. 'template1' and 'postgres' are pre-existing databases (usually created at the time the cluster was created) commonly used to connect to in order to create a new database. These names are "well-known", you don't need to specify them.

Documentation on PostgreSQL service reload not interrupting open transactions?

I have a version 9.5 PostgreSQL database in production that has constant traffic. I need to change a value in the pg_hba.conf file. I have confirmed on a test server that this can be put into effect by reloading the postgresql service.
I have read on other posts and sites that calling pg_ctl reload does not cause interuptions of live connections in postgresql. e.g https://dba.stackexchange.com/questions/41517/is-it-safe-to-call-pg-ctl-reload-while-doing-heavy-writes
But I am trying to find concrete documentation that calling pg_ctl reload or service postgresql-9.5 reload does not interrupt or effect any open transactions or ongoing queries to the db.
Here it is right from the horses mouth
reload mode simply sends the postgres process a SIGHUP signal, causing
it to reread its configuration files (postgresql.conf, pg_hba.conf,
etc.). This allows changing of configuration-file options that do not
require a complete restart to take effect.
This signal is used by lots of other servers including Apache, NGINX etc to reread the configuration files without dropping open connections.
if you are unconvinced, try this after opening the psql client or pgadmin or whaterver
START TRANSACTION;
/* open the console and do /etc/init.d/postgrsql reload or equivalent for your system */
INSERT INTO my_table(id,name) value(1,'1');
COMMIT;
If the client has been disconnected you will be notified.

Syslog-ng not logging to empty sqlite database

We're developing an application based on Yocto, distro Poky 1.7, and now we've to implement the logger, so we have installed the one already provided by our meta-oe layer:
Syslog-ng 3.5.4.1
libdbi 0.8.4.1
libdbi-drivers 0.8.3
Installation has been done without any problems and Syslog-ng can run correctly, except that it doesn't write to an existing sqlite database.
In the syslog-ng.conf file there is just one source, default unix stream socket /dev/log, and one destination, a local sqlite database (of just 4 columns). A simple program that writes 10 logs with the use of the C API 'syslog()' is used for test purposes.
If the database already exists, empty or not, when running the demo program, no log message is written into the database;
If the database doesn't exist, Syslog-ng creates it and is able to write log message until the board is rebooted. After that, we fall back into condition one, so no more log message could be save into the db.
After some days spent on this issue, I've found that this behaviour could be due to this sql statement (in function afsql_dd_validate_table(...) in afsql.c ):
SELECT * FROM tableName WHERE 0=1
I know that this is a useful statement to check existance of the table called 'tableName', and the WHERE 0=1 a always false condition to avoid to parse the whole table.
Enabling Syslog-ng debug it seems that previous statement doesn't return any information about columns, so Syslog-ng think they don't exist and try adding them, causing an error since they already exist. That's why it doesn't write anything to the database.
Modifying the sql query with this one
SELECT * FROM tableName
I'm still unable to write any log message to the database if it is empty, but now it's possible to make all working in the right way if when the database is created a dummy record (row) is added.
But this should not be the right way to work, has anyone faced thi issue and found a solution on how to make Syslog-ng logging with empty sqlite database?
Many thanks to everybody
Regards
Andrea

Do connection string DNS lookups get cached?

Suppose the following:
I have a database set up on database.mywebsite.com, which resolves to IP 111.111.1.1, running from a local DNS server on our network.
I have countless ASP, ASP.NET and WinForms applications that use a connection string utilising database.mywebsite.com as the server name, all running from the internal network.
Then the box running the database dies, and I switch over to a new box with an IP of 222.222.2.2.
So, I update the DNS for database.mywebsite.com to point to 222.222.2.2.
Will all the applications and computers running them have cached the old resolved IP address?
I'm assuming they will have.
Any suggestions along the lines of "don't have your IP change each time you switch box" are not too welcome as I cannot control this aspect of the situation, unfortunately. We are currently using the machine name of the box, which changes every time it dies and all apps etc. have to be updated with the new machine name. It hurts.
Even if the DNS is not cached local to the machine, it will likely be cached somewhere along the DNS chain between the machine and the name servers, at least for a short while. My understanding is this situation would usually be handled with IP takeover where you just make the new machine 111.111.1.1.
Probably a question for serverfault.
You're looking for DNS TTL (Time To Live) I guess.. In my opinion applications may cache the IP for at most the value of the TTL. I'm afraid however that some applications/technologies might actually cache it longer (agian in my opinion completely wrong)
Each machine will cache the ip address.
The length of time it is cached is the TTL (Time To Live). This is a setting on your DNS server, if you set it very low say 5 mins, then you show be up and running fairly quikly. A bit of a hack but it should work.
Yes, the other comments are correct in that what controls this is the DNS TTL set for the hostname database.mywebsite.com.
You'll have to decide what the maximum amount of time you're willing to wait for if you have a failure on your primary address (111.111.1.1) after you make the switch to the secondary address. Lower settings will give you a quicker recovery time, but will also increase the load and bandwidth to your DNS server because clients will have to re-query it to refresh their cache more often.
You can use nslookup using the -d option from your cmd prompt to see what your default TTL times and remaining TTL times are for the DNS server you are querying.
%> nslookup -d google.com
You should assume that they are cashed for two reasons not clearly mentioned before:
1- Many "modern" versions of OS families do DNS caching.
2- Many applications do DNS caching or have poor error/failure detection on live connections and/or opening new connections. This would possibly include your database client.
Also, this is probably not well documented. I did some googling, and found this for MySQL:
http://dev.mysql.com/doc/refman/5.0/en/connector-net-programming-connecting-connection-string.html#connector-net-programming-connecting-errors
It does not clearly explain its behavior in this regard.
I had a similar issue with a web site that disables the application pool recycling features and runs for weeks on end. Sometimes, a clustered SQL Server box would restart and for some reason, my SqlConnection's were not reconnecting. I was getting the error:
A network-related or instance-specific
error occurred while establishing a
connection to SQL Server. The server
was not found or was not accessible.
Verify that the instance name is
correct and that SQL Server is
configured to allow remote
connections. (provider: Named Pipes
Provider, error: 40 - Could not open a
connection to SQL Server)
The server was there - and running - in fact, if I just recycled the app pool, the app would work fine - but I don't like recycling app pools!
The connections that were being held in the connection pool were somehow using old connection information, and that could have been old IP addresses. This is what seems so similar to the poster's question, that it appears to be cached DNS information, because as soon as some sort of a cache is cleared, the app works fine.
This is how I solved it - by forcing all of the connections in the pool to be re-created:
Try
' Example: SqlDependency, but this could also be any SqlConnection.Open call
Dim result As Boolean = SqlClient.SqlDependency.Start(ConnStr)
Catch sqlex As SqlClient.SqlException
SqlClient.SqlConnection.ClearAllPools()
End Try
The code sample is just the boiled-down basics - it should be tweaked for your situation!
The DNS gets cached, but for any server that resolves to the wrong ip address, you can update the HOSTS file of the server and the ip should be updated immediately. This could be a solution if you have a limited amount of servers accessing your database server.

Resources