I'm looking for ideas on how to automatically track the job that calls the package.
We have some genric packages that are called from different jobs, each job passes in different file paths as parameters and therefore processes very different size files depending on the path.
In the package I have some custom auditing setup which basically tracks the package start time and end time, and therefore the duration of execution. I want to be able to also track the job that called the package so if the package is running long, I can determine which job called it.
Also note I would prefer this automatic using possibly some sort of system variable or such, so that human error is not an issue. I also want these auditing tasks built into all of our packages as a template, so I would prefer not to use a user variable either - as different packages may use different variables.
Just looking for some ideas - appreciate any input
We use parent and child packages instead of different jobs calling the same package. You could send the information about which parent called it to the child package and then in the child package records that data to a table along with the start date and end date.
Our solution has a whole meta database that records all the details through logging of each step. The parent tells the child which configuration to use and log details against that configuration. The jobs call the parent package - never the child package (which doesn't have a configuration in the config table as it is always configured through variables sent in by the parent package. No human intervention necessary (except initial development or research when a failure occurs) needed.
Edit for existing jobs.
Consider that jobs can have multiple steps. Make the first step a SQL script that inserts the auditing information into a table including the start time of the package, the name of the job that called it and thename of the ssispacakge being called. Then the second step calls the SSIS package and then make the last step a SQL script that inserts the same data only with the end datetime.
A simple way to do this is to set up a variable on your SSIS package as a varchar. Set the value to the value of the variable to #[System::ParentContainerGUID] using an expression when it starts. SQL Agent won't set the value, so when run as an individual job it will be an empty string. But if called by another package it will contain the GUID of the calling package. You can test for that value. You can use a precedence contraint to control the program logic.
We have packages that run as a part of a big program but sometimes we need to run them individually. Each package has an email on failure task but we only want that to execute when the package is run individually. When it is part of the big run we collect the names of all packages that error and send them as one email from the master package. We don't want individual emails and a summary email going out on the same run.
Related
Ooccasionaly I need to run tasks that update data in a database. I might need to run them ever again, or not a new server - no idea. For I need to run them once and in a certain release only. And they should be outside of git index.
Some tutorial suggest that I run them with "custom migration", in which a 2nd directory for migrations is created called "custom_migrations" and they'll be run from there via Ecto.Migrator. But this will case a problem: I run all of the custom_migrations, then delete all of migration files (because I won't need them anywhere else, not on a new server either, once I've run them), then create new ones when a need arises, and then Ecto.Migrator will complain about absense of the migrations that I've deleted.
I'm also aware of ./bin/my_app eval MyApp.Tasks.custom_task1 but it's not convinient because I'll have to call it manually and passing arguments to a function isn't convinient via the command line.
What I want is: create a several files that I want to be run in this current release, once. Store them in a certain directory of an application. Deploy a application. They'll get run automatically, probably on application boot and then I remove them. Then, after some time, I may want to create new ones and only those new ones will need to get run.
How to do this? What's a recommended way in Ellixir/Phoenix?
I have an SSIS job that is scheduled to run every 5 minutes via SQL Agent. The job imports the contents of an excel file into a SQL table. That all works great, but the files get placed there sporadically and often times when the job runs there is no file there at all. The issue is this is causing the job to fail and send a notification email that the job failed, but I only want to be notified if the job failed while processing a file, not because there was no file there in the first place. From what I have gathered I could fix this with a script task to check if the file is there before the job continues, but I haven't been able to get that to work. Can someone break down how the script task works and what sort of script I need to check if a file exists? Or if there is some better way to accomplish what I am trying to do I am open to that as well!
The errors I get when I tried the Foreach Loop approach are
This can be done easily with a Foreach Loop Container in SSIS.
Put simply, the container will check the directory you point it at and perform the tasks within the container for each file found. If no files are found the contents of the container are never executed. Your job will not fail if no files are found. It will complete reporting success.
Check out this great intro blog post for more info.
In the image attached the question, the specific errors are related to the Excel Source failing validation. When SSIS opens a package for editing or running, the first thing it does is validate all of the artifacts needed for a successful run are available and conform to the expected shape/API. Since the expected file may not be present, right click on the Excel Connection Manager and in the Properties menu, find a setting for DelayValidation and change it to True. This will ensure the connection manager only validates the resource is available if the package is actually going to use it i.e. it passes into the Foreach Loop Container. You will also need to set the same DelayValidation to True on your Data Flow Task.
You did not mention what scripting approach you're applying to search for your file. While using C# or VB.NET are typical scripting languages used in a Scripting control task of this nature, you can also use TSQL that will simply return a boolean value saved to a user variable (Sometimes systems limit the use C# and VB.NET). Then you apply that user variable in the control flow to determine whether to import (boolean = 1) or not (boolean = 0).
Take a look at the following link that shows in detail how to set up the TSQL script that checks for whether or not a file exist.
Check for file exists or not in sql server?
Take a look at the following link that shows how to apply a conditional check based on a boolean user variable. This example also shows how to apply VB.NET in a script task to determine if the file exists (as an alternative to the before mentioned TSQL approach).
http://sql-articles.com/articles/bi/file-exists-check-in-ssis/
Hope this helps.
I've worked a lot with Pentaho PDI so some obvious things jump out at me.
I'll call Connection Managers "CMs" from here on out.
Obvious, Project CMs > Package CMs, for extensability/ re-usability. Seems a rare case indeed where you need a Package-level CM.
But I'm wondering another best practice. Should each Project CM itself be composed of variables? (or parameters I guess).
Let's talk in concrete terms. There are specific database sources. Let's call two of them in use Finance2000 and ETL_Log_db. These have specific connection strings (password, source, etc).
Now if you have 50 packages pulling from Finance2000 and also using ETL_Log_db ... well ... what happens if the databases change? (host, name, user, password?)
Say it's now Finance3000.
Well I guess you can go into Finance2000 and change the source, specs, and even the name itself --- everything should work then, right?
Or should you simply build a project level database called "FinanceX" or whatever and make it comprised of parameters so the connectoin string is something like #Source + # credentials + # whatever?
Or is that simply redundant?
I can see one benefit of the parameter method is that you can change the "logging database" on the fly even within the package itself during execution, instead of passing parameters merely at runtime. I think. I don't know. I don't have a mountain of experience with SSIS yet.
SSIS, starting from version 2012, has SSIS Catalog DB. You can create all your 50 packages in one Project, and all these packages share the same Project Connection Managers.
Then you deploy this Project into the SSIS Catalog; the Project automatically exposes Connection Manager parameters with CM prefix. The CM parameters are parts of the Connection Manager definition.
In the SSIS Catalog you can create so called Environments. In the Environment you define variables with name and datatype, and store its value.
Then - the most interesting part - you can associate the Environment and the uploaded Project. This allows you to bind project parameter with environment variable.
At Package Execution - you have to specify which Environment to use when specifying Connection Strings. Yes, you can have several Environments in the Catalog, and choose when starting Package.
Cool, isn't it?
Moreover, passwords are stored encrypted, so none can copy it. Values of these Environment Variables can be configured by support engineers who has no knowledge of SSIS packages.
More Info on SSIS Catalog and Environments from MS Docs.
I'll give my fair share of experience.
I recently had a similar experience at work, our 2 main databases name's changed, and i had no issue, or downtime on the schedules.
The model we use is not the best, but for this, and for other reasons, it is quite confortable to work with. We use BAT files to pass named parameters into a "Master" Job, and basically depending on 2 parameters, the Job runs on an alternate Database/Host.
The model we use is, in every KTR/KJB we use a variable ${host} and ${dbname}, these parameters are passed with each BAT file. So when we had to change the names of the hosts and databases, it was a simple Replace All Text Match in NotePad++, and done, 2.000+ BAT Files fixed, and no downtime.
Having a variable for the Host/DB Name for both Client Connection and Logging Connection lets you have that flexibility when things change radically.
You can also use the kettle.properties file for the logging connection.
In a package I have two loop containers that run fine one after the other. Each has its own variable name used to iterate over and load two different sets of Excel files to the same table. As far as I can tell there is no overlap between the packages so I thought to speed things up by running them in parallel.
When starting the package however (manually in SSIS), the containers look like they execute but then after a few seconds the entire package shows as complete without any errors, and none of the loop containers or subsequent tasks did anything.
The package log only shows validation completed for each of the loop containers.
Is there some switch somewhere to make two loop containers play nicely?
Here is what it looks like:
Place the two loops and their corresponding script tasks (via precedence constraints) in a sequence container. Connect the Create Table script task to the sequence container. Then connect the sequence container to D Product Family data flow.
Note: disabling a task won't affect operation as SSIS will just skip over the disabled task(s) and go to the next one until all tasks have been completed.
I have one SSIS Package that must run as Proxy A and another that must run as Proxy B. I would love to have the first package run, and, as one of its tasks, execute the second package. Is this possible?
Thanks a lot!
You could have the first package use sp_start_job to kick off a job that is set up to run the second package. If this is "fire-and-forget", that's all you need to do. If you need to wait until it's completed, things get more messy - you'd have to loop around calling (and parsing the output of) sp_help_jobactivity
and use WAITFOR DELAY until the run completes.
This is also more complex if you need to determine the actual outcome of running the second package.