SQL Server [Distribution clean up: distribution] job suspended - sql-server

I have several publications in SQL Server 2016 in my test environment, when my distribution clean up job runs it running forever without deleting anything.
I did some digging around and discovered that this job is actually blocked by one repl-logreader of the server, however, when I checked the replication monitor for all the publications of the server, all of them showing good status without any latency. Their undistributed commands are '0'.
What else that I should look at to solve this issue?

Related

Service has stopped because of reboot of SQL Server example rebooting

I work with Delphi 2010 and SQL Server 2012. Components for database is sdac.
I have a service working with database and it has several threads. In them it executes different requests to database. Every of these threads had a connection timeout and command timeout, and separate tmsconnection of course. And one time a sample of SQL Server itself was rebooted by the admin.
Operations in main thread had coped with this situation without problem. One of the other threads have written error in its connection and it was worked out, but stopped hang itself just after, I think in the beginning of next iteration. Other threads which had connection only initially worked out this situation without any problems.
Tell me please, does it mean during every connection to database in a secondary threads, I run the risk of hanging if the server would be rebooted during that time, and I can do nothing to protect against that?

SSIS package takes long time or fail eventually

I have packages deployed on a sql server 2008R2 and recently, we migrated to a new server machine, deployed with sql server 2012. I configured packages to project deployment mode and for 10 days, all packages are working smoothly, with the execution times in the same range of older server.
Since last two days, packages started to fail. I checked in detail and found that, they are taking longer time than usual, and fail due to "Protocol error in TDS stream, communication link failure and remote host forcibly closed the connection".
When I tried to run the package through ssdt, they can run successfully, but I see data transfer movement slower than I used to see, and so package execution time is much longer.
I am not sure, what has changed. I have searched the internet for the possible reason and checked the server memory and packet size, and tried match with the older server, which did not solve the problem. I suspect, SSIS logging may have causes this, but not sure how to check it?
Please help to identify the cause of this problem.
**Edit: I enabled logging in ssdt and could see that majority of time is used in rows transfer steps only. Since my package have look ups, I thought that look ups might make it slower somehow. So copied the main query to ssms and run as a normal query on this server.
About 13L rows were returned in 12 minutes. Then I run the same query on the old server, there it returned 13L rows in less than a minute. So, possibly it proves the problem somehow is related with data transfer and not specific to packages itself.
Can Someone help please.**
Just check the solution connection, it should be ‘RetainSameConnection’ property to 'true'. This can be done both in the SSIS package under connection manager properties and in the job step properties (Configuration > Connection Managers).
Link: http://www.sqlerudition.com/what-is-the-retainsameconnection-property-of-oledb-connection-in-ssis/

SQL Server job hangs when calling an SSIS package until agent is restarted

I have googled and read many questions/answers, but only one question has ever sounded exactly the same and it did not have an answer.
The situation:
My group has several SQL Servers that are running SQL Server 2017. They are configured virtually identically.
These servers are build boxes, meaning they pull data from a data ware house, or an extract file, run some ETL processing and then push to a prod box. SSIS packages are deployed on the box where the DB resides.
Just over a month ago (with no updates having occurred), one of these servers started having an issue where all the jobs that ran an SSIS package would "hang" on the step that ran the package. Any other step runs fine. But a job step that runs a package (all jobs do this), will not even start the package. The package shows no indication in the executions that anything has even tried to start it.
If the user executes the deployed package it will run successfully.
The only thing that will "fix" the issue is restarting the agent service.
I created a simple job to run a simple package every 5 mins. It had been running for about a week, the last time it ran was 4/11/2021 at 2:40am, the 2:45 run hung. I could find nothing in the event logs that occurred at that time. The server was rebooted as a normal scheduled process at 3:15 and was online by 3:25 because that is the next time it tried to run and it again just hung. So even a server reboot did not fix the issue.
I am at my wits end, since there is no error (the job hangs and the package does not even start) there is no logging that I can find that is showing any issues, I am at a loss as to what might cause this.
Thanks in advance.
Take a look at the SSISDB catalog database on each/all the servers involved. Has it grown exponentially and needs the history etc. cleared down or settings changed? How big are the transaction logs for those databases etc.?

SQL Server 2008 R2 Job Launched Step 1 hundreds of times

I have an ETL SSIS package that is scheduled via job to run nightly at 7pm. It is the only step in the job, and the failure action is "quit the job reporting failure". The server is Windows Server 2008 R2, and the SQL Server version is 2008 R2. There is also an instance of SQL Server 2012 installed on this server, but the services are not started for that instance.
I've made no changes to the job, package, or server, and tonight it behaved strangely. When I look at the history of the job and expand tonight, it shows starting step 1 over 400 times, all at exactly 7 PM. It looks like it just kept launching it until the transaction log filled the entire drive and had no more space to grow, then exited the job reporting failure. I shrunk the transaction log by setting recovery mode to simple and running DBCC SHRINKFILE. I then restarted all of the SQL services for that instance and re-ran the job. So far, it seems to be running as expected, although I suppose time will tell.
I did a search of stack overflow and have seen nothing like this mentioned. We're actually starting a project to virtualize the box, then upgrade to 2012, so this may end up being one of those oddball things that never happens again, but I thought I'd ask in case anyone has any idea why this might have happened.
open the job step and go to the advanced tab. Look at the retry attempts. could it be that it has a big number? this would make the step run many times if it fails.
:

SQL Server 2005 Replication Stops after a minute without an error

Background:
I have a SQL Server 2005 setup with master, slave1, slave2 replication set up as a pull replication from slaves. The distribution database resides on the slave1 machine, both slaves pull.
A problem began today where the replication on slave1 simply stops running. It claims that it completed successfully, but it does not restart, and manually starting the process finishes in roughly one minute, again without an error message.
Replication is running fine on slave2, but I can't seem to figure out what's wrong on slave1. I've tried the obvious Windows debugging 101: "restart the machine" technique, but to no avail.
Has anyone encountered this before Does anyone have an idea of what I could check or change to get it working again? I'm especially at a loss as SQL Server claims that the job is just finishing successfully.
Though I'm unsure of why this began occurring. It appears to be due to the use of a custom SQL Server Replication Agent profile. Switching to using the default got it working again.

Resources