Any suggestions on logging ETL processes?

Any suggestions on logging ETL processes? - sql-server

I need to create logging for ETL processes and a typical process looks something like this..
File arrives on the SFTP server creates a trigger that moves the file and run SQL Agent job. SQL Agent jobs contains multiple steps like run stored procedures and SSIS packages(which calls stored procedures etc)..
Also, there is a lot of interdependency like a procedure is used by multiple processes, a file trigger might trigger multiple ETL. Anyway, you get the picture. I am currently creating a logging database for each tool and I was wondering if anyone here have any suggestions on a simple way to track such dependencies to make sure everything ran as intended.

I suggest logging the arrival of each file, with its name, date of arrival. And that subsequent steps which use it relate their work to this file in their own logging.
e.g.
FileID FileName ArrivedAt
2 BLA.txt 2022-07-27 10:00
Then an SSIS package, or proc, which runs as a result of the arrival of this file can log something like "Package DOTHINGS.dtsx ran at [time] using fileID 2".

Related

SSRS - Using ReportServer AddEvent, not processing subscriptions always

I'm trying to create a homegrown subscription system for our ReportServer for QA purposes. I'm creating subscriptions on the front end and setting them on a schedule to execute once in the past (1/1/2019 for example) so the only way they would be executed is manually.
I looked at the jobs and they are simply executing a procedure "ReportServer.dbo.AddEvent" with EventType and EventData as parameters. When I execute that procedure, it adds a record to the Event table and then the Notifications table for the RS to pick up and process the subscription. Most of the time it works but sometimes it just hangs. I'm finding this for data-driven subscriptions especially. They are stuck in pending/processing and never execute fully.
Has anyone else had issues with this before? I'm building out a homegrown system because we have 200+ reports that we want to execute on a nightly basis. They are usually pretty quick but I want to only execute a handful at a time and I'm taking care of this through a custom Queue table.

Figured it out within the log files... It was data-driven but each version had the same file name and it was trying to overwrite a file that was still open.

Create SQL Server Agent job by using existing job script?

I have two sql server agent jobs: Job1 and Job2. Since Job2 is very similar to Job1, I right-clicked on Job1 > Script Job As > Create To and used that script to create the Job2.
Now I'm seeing that any change I make in the schedule of Job2 is also affecting Job1, and I'm assuming that it's happening because both have the same #schedule_uid.
So, two questions:
Is it correct to generate a job by using the sql script of another job? If it's correct, how can I fix this error where changes made in one job affect the other job?
Thanks.

Schedules are distinct objects within SQL Server and as you have found are independent to the Job, referenced by an ID.
If you are creating jobs via script you just need to either not assign a schedule, create a new schedule as well for every job, or define a set of possible schedules that fit around your requirements/maintenance windows and specify the correct ID in your script.
Obviously any two jobs that share a schedule will be affected if you change the schedule so, if you foresee a lot of individual job management/tweaking going on, it may be best that your script creates a schedule and then reference that new schedule in the job creation.

If you look in your script you will see that the #schedule_uid is the same. If you would fetch the new #schedule_uid (and all other hardcoded id's) from the tables in the MSDB database you will get a correct running job

DatabaseChangeLog and Lock tables generated on every run

I am working on a project that manages many databases on a handful of servers, and we are using liquibase to keep our database structures current.
As I understand it, liquibase is meant to generate the Databasechangelog and databasechangelogLock tables the first time that it recognizes a Changelog file is executed for a specific database.
The problem is that in one of our environments a couple of the databases seem to try to generate those tables with every run, so that the script that we are getting from liquibase continues to have the initial table creation every time a script is generated. The actual db changes are generated as expected, along with their databasechangelogtable entries, and I can continue generating and running scripts as long as the create table statements for the initial tables is ignored with every run.
This is happening on Databases that work correctly on another server. The majority of the databases on the 'problem' server work correctly as well, so it is only a few of them that are experiencing this problem.
If anyone knows what part of liquibase does the check to decide if the tables need to be generated, or how I can manipulate the run to not try to create them every time it would be very appreciated.
Thanks

Pretty much every Liquibase command checks for the tables and creates them if needed. If you want to know 'which part of Liquibase' is doing this, it is typically the StandardChangeLogHistoryService and the StandardLockService.

Get SQL Agent job status without polling?

I'm trying to find a way to have the SQL Server 'SQL Agent' run a particular piece of code on job step events. I had hoped that there was a way using SMO to register a callback method so that when job steps begin or change status, my code is called. I'm not having any success. Is there any way to have these events pushed to me, rather than polling?

There is no Smo, DDL or trace event exposed for job execution (as far as I can see from Books Online), so I don't think you can do what you want directly. It would be helpful if you could explain exactly what your goal is (and your MSSQL version) and someone may have a useful suggestion. For example, a trace may be a better idea if you want to gather audit or performance data.
In the meantime, here are some ideas (mostly not very 'nice' ones):
Convert your jobs into SSIS packages (they have a full event model)
Build something into the job steps themselves
Log job step completion to a table, and use a trigger on the table to run your code
Run a trace with logging to a table and use a trigger on the table to run your code

Getting stored procedure usage data on SQL Server 2000

What is the best way to get stored procedure useage data on a specific database out of SQL Server 2000?
The data I need is:
Total of all stored procedure calls over X time
Total of each specific stored procedure call over X time.
Total time spent processing all stored procedures over X time.
Total time spent processing specific stored procedures over X time.
My first hunch was to setup SQL Profiler wiht a bunch of filters to gather this data. What I don't like about this solution is that the data will have to be written to a file or table somewhere and I will have to do the number crunching to figure out the results I need. I would also like get these results ober the course of many days as I apply changes to see how the changes are impacting the database.
I do not have direct access to the server to run SQL Profiler so I would need to create the trace template file and submit it to my DBA and have them run it over X time and get back to me with the results.
Are there any better solutions to get the data I need? I would like to get even more data if possible but the above data is sufficient for my current needs and I don't have a lot of time to spend on this.
Edit: Maybe there are some recommended tools out there that can work on the trace file that profile creates to give me the stats I want?

Two options I see:
Re-script and recompile your sprocs to call a logging sproc. That sproc would be called by all your sprocs that want to have perf tracking. Write it to a table with the sproc name, current datetime, and anything else you'd like.
Pro: easily reversible, as you'd have a copy of your sprocs in a script that you could easily back out. Easily queryable!
Con: performance hit on each run of the sprocs that you are trying to gauge.
Recompile your data access layer with code that will write to a log text file at the start and end of each sproc call. Are you inheriting your DAL from a single class where you can insert this logging code in one place? Pro: No DB messiness, and you can switch in and out over an assembly when you want to stop this perf measurement. Could even be tweaked with on/off in app.config. Con: disk I/O.

Perhaps creating a SQL Server Trace outside of SQL Profiler might help.
http://support.microsoft.com/kb/283790
This solution involves creating a text file with all your tracing options. The output is put into a text file. Perhaps it could be modified to dump into a log table.
Monitoring the traces: http://support.microsoft.com/kb/283786/EN-US/

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Any suggestions on logging ETL processes? - sql-server

Related

SSRS - Using ReportServer AddEvent, not processing subscriptions always

Create SQL Server Agent job by using existing job script?

DatabaseChangeLog and Lock tables generated on every run

Get SQL Agent job status without polling?

Getting stored procedure usage data on SQL Server 2000

Categories

Resources