I've got two SQL instances in two separate servers,
The first one has two databases, A and B for that purpose, that I'm logshipping to the secondary server; in database A it works like a charm, however, when logshipping database B we encounter errors, the database enters suspect mode, haven't been able to figure out why.
The secondary server used to be the main one, and it had logshipping to a third server, which stopped working. When the secondary server was our main one we never encountered logshipping issues, both databases A and B were logshipping without issues for a long period of time.
We installed the same SQL Server version on the new server as we had on the previous one and reconfigured the SQL Agent Jobs to get it up and running, database A has encountered no errors so far, however we can't seem to get the second database running for longer than a few hours.
All research I've done has led no-where, there's a few things that I've seen suggested online and we've ran our tests with no success so far.
To point out: Database A is much larger than B, however this one has had no issues whatsoever.
We are using SQL Server 2008 R2 with Service Pack 3.
Any help to anyone who might have encountered this in the past would be very appreciated.
Thanks
Related
We have a client deployment of our software that is showing intermittent SQL server connection failures, and we are struggling to understand them.
Our system consists of a SQL Server DB (2012) and 14 identical engines, each installed on a Windows 2012 VM. Each of these was created from the same template so they should be identical. The engines consist of a Windows service that connects to the DB on startup by reading a single row from a table. If the connection fails they will wait a few seconds and try again, until they get a connection.
In this particular case, the VMs were all rebooted due to a Windows Update. (The SQL server had the update/reboot about 12 hours before). They came online within a few minutes of each other. 12 of the engines started up without any problem. Two of them, however, failed to connect to the DB with:
"The underlying provider failed on Open."
Those two engines then started to poll, and continued to get this error for many hours. The rest of the engines had started up and were fine. We have a broker service too that was accessing the DB throughout and showed no connection issues.
When the client noticed this issue, they restarted the engine services on the two problem VMs, and the two engines connected to the DB just fine.
We are trying to understand what could have happened here. I guess my main questions are:
What could be an explanation of why 12 connections succeed and two fail? There's absolutely no difference as far as we know between the engines. The query itself is very simple.
Why did the connection continue to fail for those two engines until the service was restarted? This suggests to me that there is some process-level failed state that is only cleared when restarting the services. I've looked at the code to see if it was reusing the connections. It uses Entity Framework to read the single table row, and we create a fresh DbContext each time. I don't understand how this could go wrong.
We noted that there was a CheckDb operation proceeding on the DB around the time the services were coming up, and we wondered if this could be related to the issue. However, the client says that this runs every night and hasn't caused problems in the past. And it wouldn't explain why the engines didn't come back up again.
Thanks in advance for any help.
I have an ETL SSIS package that is scheduled via job to run nightly at 7pm. It is the only step in the job, and the failure action is "quit the job reporting failure". The server is Windows Server 2008 R2, and the SQL Server version is 2008 R2. There is also an instance of SQL Server 2012 installed on this server, but the services are not started for that instance.
I've made no changes to the job, package, or server, and tonight it behaved strangely. When I look at the history of the job and expand tonight, it shows starting step 1 over 400 times, all at exactly 7 PM. It looks like it just kept launching it until the transaction log filled the entire drive and had no more space to grow, then exited the job reporting failure. I shrunk the transaction log by setting recovery mode to simple and running DBCC SHRINKFILE. I then restarted all of the SQL services for that instance and re-ran the job. So far, it seems to be running as expected, although I suppose time will tell.
I did a search of stack overflow and have seen nothing like this mentioned. We're actually starting a project to virtualize the box, then upgrade to 2012, so this may end up being one of those oddball things that never happens again, but I thought I'd ask in case anyone has any idea why this might have happened.
open the job step and go to the advanced tab. Look at the retry attempts. could it be that it has a big number? this would make the step run many times if it fails.
:
Background:
I have a SQL Server 2005 setup with master, slave1, slave2 replication set up as a pull replication from slaves. The distribution database resides on the slave1 machine, both slaves pull.
A problem began today where the replication on slave1 simply stops running. It claims that it completed successfully, but it does not restart, and manually starting the process finishes in roughly one minute, again without an error message.
Replication is running fine on slave2, but I can't seem to figure out what's wrong on slave1. I've tried the obvious Windows debugging 101: "restart the machine" technique, but to no avail.
Has anyone encountered this before Does anyone have an idea of what I could check or change to get it working again? I'm especially at a loss as SQL Server claims that the job is just finishing successfully.
Though I'm unsure of why this began occurring. It appears to be due to the use of a custom SQL Server Replication Agent profile. Switching to using the default got it working again.
When running a very simple query in SQL Server 2000.
SELECT getDate()
Most queries are sub second, but one query randomally in 10 takes about five seconds.
I am running these queries from SQL Server 2008 Management studio, but it occurs in other clients and on other machines as well, so it is not client specific.
The query is running to a server which is on the same network and there is no significant load on the server.
Can anyone tell me why this might be happening?
Sounds like network issues. We had the same thing happen when I worked for a large bank. Due to politics, it was out of our control.
You can do a few things to confirm this, like try running the queries from the server, etc.
The two things I would suspect without more information right off the bat are network latency and server load. Do you get this behavior when running the query from the database server machine itself? Do you get this behavior when running in single-user mode?
I have a server with a default instance and 2 named instances of SQL Server 2005 standard installed. This is a mission critical production server that cannot be restarted during normal business hours.
Will uninstalling the two named instances of SQL Server 2005 require a reboot or put the server in a state that may cause issues with the default instance of SQL Server 2005 until it's rebooted?
This would probably get a better answer at serverfault.com.
I'm not sure how much it would help perf, if SQL isn't getting hit it doesn't do much. You could probably get away with uninstalling, but then again when in surgery bad things happen. I've never killed SQL server uninstalling an instance, but I have killed the client tools. I would take one of the following approaches:
a) First, backup and drop all the databases to reclaim the disk space. Then stop disable the services for the named instances. The binaries will still be there, but they aren't too large and will be sitting idle.
b) Better long-term plan if you can source the hardware is to setup a new de novo box and drop a clean SQL instance over there, then port the live server over. Really not too painful. Then repurpose old box as is fit.
Is there a critical reason why you need to uninstall named instances. Can you just ignore them?
EDIT: the answer is yes you can uninstall via add/remove programs
Rebooting doesn't occur
An article which might apply to your situation:
http://support.microsoft.com/?kbid=915854