We are having solr architecture as
We are facing the frequent replication failure between master to repeater server as well as between repeater to slave servers.On checking logs found every time one of the below exceptions occurred whenever the replication have failed.
1)
2)
3)
The replication configuration of master,repeater,slave's is given below:
Commit Configuration master,repeater,slave's is given below :
Replication between master and repeater occurs every 10 mins.
Replication between repeater and slave servers occurs every 15 mins between 4-7 am and after that in every 3 hours.
Please help in replication failure issue.
Related
I am trying to use DMS to capture change logs from the SQL Server and write them to S3. I have set up a long polling period of 6 hours. (AWS recommends > 1 hour). DMS fails with the below error when the database is idle for a few hours during the night.
DMS Error:
Last Error AlwaysOn BACKUP-ed data is not available Task error notification received from subtask 0, thread 0
Error from cloud watch - Failed to access LSN '000033fc:00005314:01e6' in the backup log sets since BACKUP/LOG-s are not available
I am currently using DMS version 3.4.6 with multi-az.
I always thought the DMS reads the change data immediately after updating the T log with the DML changes. Why do we see this error even with a long polling period? Can someone explain why this issue is caused? how we can handle this?
Over the last few days we have occasionally see our Snowflake lookup activities failing due to the error "The remote name could not be resolved: transfereu2storage1.blob.core.windows.net". Rerunning the activity results in a success.
What's very strange is I check our Snowflake query history and see that the task ran successfully, so I imagine transferring the data back to Azure Data Factory is where the task fails out. There doesn't seem to be any pattern to these errors as it will randomly happen across any of our lookup activities. It will usually happen to 1-3 lookup activities out of 100 or so. Has anyone seen this error?
It might be a transient issue. As you mentioned rerunning results in success, you can try utilizing Retry (provide the number of retry attempts) and Retry interval (the duration between each Retry attempt) in lookup activity under General.
PROBLEM!!
After setting up my Logical Replication and everything is running smoothly, i wanted to just dig into the logs just to confirm there was no error there. But when i tail -f postgresql.log, i found the following error keeps reoccurring ERROR: could not start WAL streaming: ERROR: replication slot "sub" is active for PID 124898
SOLUTION!!
This is the simple solution...i went into my postgresql.conf file and searched for wal_sender_timeout on the master and wal_receiver_timeout on the slave. The values i saw there 120s for both and i had to change both to 300s which is equivalent to 5mins. Then remember to reload both servers as you dont require a restart. Then wait for about 5 to 10 mins and the error is fixed.
We had an identical error message in our logs and tried this fix and unfortunately our case was much more diabolical. Putting the notes here just for the next poor soul but in our case, the publishing instance was an AWS managed RDS server and it managed (ha ha) to create such a WAL backlog that it was going into catchup state, processing the WAL and running out of memory (getting killed by the OS every time) before it caught up. The experience on the client side was exactly what you see here - timeouts and failed WAL streaming. The fix was kind of nasty - we had to drop the whole replication link and rebuild it (fortunately it was a test database so not harm done but it's a situation you want to avoid). It was obvious after looking on the publisher side and seeing the logs but from the subscription side more mysterious.
We ran out of space on our Production Server and during this time we started getting: "Cannot execute 'sp_replcmds' on " on Replication. The Distributor is the Publisher as well.
After fixing the space issue - this is the only error I'm getting on my Replication
We have five databases set-up for Replication. The four small databases work with no error messages except that the Last Synchronization Status says the following: "The process could not connect to Distributor "
The one large database gets the error in the subject and also that it cannot connect to the Distributor . The Error Code is: MSSQL_REPL22037
I checked the DBOwner and it is set up correctly. I stopped and started the Log Reader Agents too many times to count. I restarted the MSSQLServer Agent Processes on the Subscriber Server as well.
I solved this one myself. After all the other suggestions
It was definitely the BatchSize and the QueryTimeOut properties.
In order to change this:
Launch Replication Monitor.
Expand to the Publication in question.
Go to Agents Tab.
Right Click on Log Reader Agent > Agent Profile.
Create a New Agent Profile with the new parameters you need.
Set the New Profile to 'Use for this Agent'
Restart the Log Reader Agent and just wait.
Rinse/Repeat until you get the right amount.
I set the Timeout to 2400 and the BatchSize to 100 from 1800 and 500 respectively.
I'm attempting to populate a DB on my local SQL2008 Server using a Data Generation Plan. However, when I run it I get:
Data generation failed because of the following exception: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.. occurred 1 time(s).
I've tried setting the Connection timeout setting in the Advanced connection properties to 120 instead of 15, but I still get the error.
How do I fix this problem?
There are roughly 40 tables involved and about 20 of those are getting 100 rows inserted while, 10 tables ~1000 rows and the rest less than 100 rows. Also, when I exclude the trouble table the script completes successfully.
Thanks!
Go to Tools menu > Options > Database Tools > Data Generator > SQL Timeout
You may have to restart Visual Studio for the change to take effect, at least I had to.
There are also other timeout values that can be configured via the Registry (QueryTimeoutSeconds, LongRunningQueryTimeoutSeoncds, LockTimeoutSeconds) though I don't understand the difference. See here.