SQL Server 2008 R2 Job Launched Step 1 hundreds of times - sql-server

I have an ETL SSIS package that is scheduled via job to run nightly at 7pm. It is the only step in the job, and the failure action is "quit the job reporting failure". The server is Windows Server 2008 R2, and the SQL Server version is 2008 R2. There is also an instance of SQL Server 2012 installed on this server, but the services are not started for that instance.
I've made no changes to the job, package, or server, and tonight it behaved strangely. When I look at the history of the job and expand tonight, it shows starting step 1 over 400 times, all at exactly 7 PM. It looks like it just kept launching it until the transaction log filled the entire drive and had no more space to grow, then exited the job reporting failure. I shrunk the transaction log by setting recovery mode to simple and running DBCC SHRINKFILE. I then restarted all of the SQL services for that instance and re-ran the job. So far, it seems to be running as expected, although I suppose time will tell.
I did a search of stack overflow and have seen nothing like this mentioned. We're actually starting a project to virtualize the box, then upgrade to 2012, so this may end up being one of those oddball things that never happens again, but I thought I'd ask in case anyone has any idea why this might have happened.

open the job step and go to the advanced tab. Look at the retry attempts. could it be that it has a big number? this would make the step run many times if it fails.
:

Related

SQL Server [Distribution clean up: distribution] job suspended

I have several publications in SQL Server 2016 in my test environment, when my distribution clean up job runs it running forever without deleting anything.
I did some digging around and discovered that this job is actually blocked by one repl-logreader of the server, however, when I checked the replication monitor for all the publications of the server, all of them showing good status without any latency. Their undistributed commands are '0'.
What else that I should look at to solve this issue?

SQL Server job hangs when calling an SSIS package until agent is restarted

I have googled and read many questions/answers, but only one question has ever sounded exactly the same and it did not have an answer.
The situation:
My group has several SQL Servers that are running SQL Server 2017. They are configured virtually identically.
These servers are build boxes, meaning they pull data from a data ware house, or an extract file, run some ETL processing and then push to a prod box. SSIS packages are deployed on the box where the DB resides.
Just over a month ago (with no updates having occurred), one of these servers started having an issue where all the jobs that ran an SSIS package would "hang" on the step that ran the package. Any other step runs fine. But a job step that runs a package (all jobs do this), will not even start the package. The package shows no indication in the executions that anything has even tried to start it.
If the user executes the deployed package it will run successfully.
The only thing that will "fix" the issue is restarting the agent service.
I created a simple job to run a simple package every 5 mins. It had been running for about a week, the last time it ran was 4/11/2021 at 2:40am, the 2:45 run hung. I could find nothing in the event logs that occurred at that time. The server was rebooted as a normal scheduled process at 3:15 and was online by 3:25 because that is the next time it tried to run and it again just hung. So even a server reboot did not fix the issue.
I am at my wits end, since there is no error (the job hangs and the package does not even start) there is no logging that I can find that is showing any issues, I am at a loss as to what might cause this.
Thanks in advance.
Take a look at the SSISDB catalog database on each/all the servers involved. Has it grown exponentially and needs the history etc. cleared down or settings changed? How big are the transaction logs for those databases etc.?

SQL Server 2005 Replication Stops after a minute without an error

Background:
I have a SQL Server 2005 setup with master, slave1, slave2 replication set up as a pull replication from slaves. The distribution database resides on the slave1 machine, both slaves pull.
A problem began today where the replication on slave1 simply stops running. It claims that it completed successfully, but it does not restart, and manually starting the process finishes in roughly one minute, again without an error message.
Replication is running fine on slave2, but I can't seem to figure out what's wrong on slave1. I've tried the obvious Windows debugging 101: "restart the machine" technique, but to no avail.
Has anyone encountered this before Does anyone have an idea of what I could check or change to get it working again? I'm especially at a loss as SQL Server claims that the job is just finishing successfully.
Though I'm unsure of why this began occurring. It appears to be due to the use of a custom SQL Server Replication Agent profile. Switching to using the default got it working again.

Deadlock on communication buffer: SQL Server 2008 R2 running stored procedures for data warehouse

Currently running SQL Server 2008 R2 SP1 on 64-Bit Windows Server 2008 R2 Enterprise on a Intel dual 8-core processor server with 128 GB RAM and 1TB internal SCSI drive.
Server has been running our Data Warehouse and Analysis Services packages since 2011. This server and SQL instance is not used for OLTP.
Suddenly and without warning, all of the jobs that call SSIS packages that build the data warehouse tables (using Stored Procedures) are failing with "Deadlock on communication buffer" errors. The SP that generates the error within the package is different each time the process is run.
However, the jobs will run fine if SQL Server Profiler is running to trace at the time that the jobs are initiated.
This initially occured on our Development server (same configuration) in June. Contact with Microsoft identified Disk I/O issues, and suggested setting MaxDOP = 8, which has mitigated the deadlock issue, but introduced an issue where the processes can take up to 3 times longer at random intervals.
This just occurred today on our Production server. MaxDOP is currently set to zero. There have been no changes to OS, SQL Server or the SSIS packages in the past month. The jobs ran fine overnight on September 5th, but failed with the errors overnight last night (September 6th) and continue to fail on any retry.
The length of time that any one job will run before failing is not consisent nor is there consistency between jobs. Jobs that take 2 minutes to run to completion previously will fail in seconds, where jobs that normally take 2 hours may run anywhere from 30 - 90 minutes before failing.
Have you considered changing the isoloation level of the database. This can help when parallel reads and writes are happening on the database.

Scheduled jobs in Sql Agent

I have created SSIS packages to move data from AS400 to SQL Server which are scheduled daily.some of the packages in sql agent are taking longer duration more than 9 hours to complete.IF I run same package in Business intelligence studio manually, it is completing in less than 4 hours.Due to this problem my schedule packages are not competing on time.please help me to sort out this issue. I am unable to understand why there is a difference in task completion duration between manual interaction and schedule jobs.
My environment is windows server 2003 with sql server 2005 with SP3.please help me to sort out this issue.
The best way to get around this problem is to watch the scheduled task by using some debug statements and messages. For example, have some insert statements in the stored procedures the SSIS package is invoking. This way u will get to know what control is taking more time than expected. First try to isolate the control that is making the difference.
Also, you can invoke the package from command prompt using:-
dtexec /f filename.dtsx
This will print out all the messages in the console at each step as well.
Use SSIS logging in the package to log to a database table. Set logging to record start and end of tasks. By running the package in BIDS and comparing it to the logging when it is run on the server you will see which tasks are taking too long. See http://msdn.microsoft.com/en-us/library/ms138020.aspx for more info on SSIS logging (in sql 2008)
Might it be that the SQL server is less powerful than your client or has more load when you execute the package?
Business intelligence Studio the package is executed on your local client with it's CPU and RAM (I think).
Check what version of DTSEXEC you are using. May be you are using 32-bit version at one place and 64-bit at the other one.

Resources