Using variable DB Connections in different identical Kettle Jobs/Transformatioins - database

I've read thru many related topics here, but don't seem to find a solution. Here's my scenario:
I have multiple identical customer databases
I use ETL to fill special tables within these databases in order to use as a source for PowerBI reports
Instead of copying (and thus maintaining) the ETLs for each customer, I want to pass the DB connection to the Jobs/Transformations dynamically
My plan is to create a text file of DB connections for each Customer:
cust1, HOST_NAME, DATABASE_NAME, USER_NAME, PASSWORD
cust2, HOST_NAME, DATABASE_NAME, USER_NAME, PASSWORD
and so on...
The Host will stay the same always.
The jobs will be started monthly using Pentaho kitchen in a linux box.
So when I run a Job for a specific customer, I want to tell the job to use the DB connection for that specific customer, i.e. Cust2. from the Connection file.
Any help is much appreciated.
Cheers & Thanks,
Heiko

Use parameters!
When you define a connection, you see a small S sign in a blue diamond on the right of the Database Name input box. It means that, instead of spelling the name of the database, you can put in a parameter.
The first time you do it, it's a bit challenging. So follow the procedure step by step, even if you are tempted to go straight to launch a ./kitchen.sh that reads a file containing a row per customer.
1) Parametrize your transformation.
Right-click anywhere, select Properties then Parameters, fill the table:
Row1 : Col1 (Parameter) = HOST_NAME, Col2 (Default value) = the host name for Cust1
Row2 : Col1 = DATABASE_NAME, Col2 = the database name for Cust1
Row3 : Col1 = PORT_NUMBER, Col2 = the database name for Cust1
Row4 : Col1 = USER_NAME, Col2 = the database name for Cust1
Row5 : Col1 = PASSWORD, Col2 = the database name for Cust1
Then go to the Database connection definition (On the left panel, View tab) and in the Setting panel:
Host name: ${HOST_NAME} -- The variable name with a "${" before and a a "$" after
Database name: ${DATABASE_NAME} -- Do not type the name, press Crtl+SPACE
Port Number: ${PORT_NUMBER}
Database name: ${USER_NAME}
Database name: ${PASSWORD}
Test the connection. If valid try a test run.
2. Check the parameters.
When you press the run button, Spoon prompts for some Run option (If you checked the "Don't show me anymore" in the past, use the drop-down just near by the Run menu).
Change the values of the parameters for those of Cust2. And check it runs for the other customer.
Change it on the Value column and the Default value column. You'll understand the difference in a short while, for the moment check it works with both.
3. Check it in command line.
Use pan from the command line.
The syntax should look like :
./pan.sh -file=your_transfo.ktr -param=HOST_NAME:cust3_host -param=DATABASE_NAME:cust3_db....
At this point, you have a small bit of trials and errors, because the syntax between = and : varies sightly with the OS and the PDI version. But you should get by with 4-6 trials.
4. Make a job
Do to the parallel computing paradigm of PDI, you cannot use the Set variable step in a single transformation. You need to make a job with two transformation : the first reads the csv file and define the variables with the Set variable step. The second is the transformation you just developed and tested.
Don't expect to make it run on the first trial. Some versions of the PDI are buggy and requires, for example to clean the default value of the parameters in the transformation. You are helped with the Write to log step which will write a field in the log of the calling job. Of course you will need to first put the parameters/variables in a field with the Get variable step.
In particular, do not start with the full customer list! Set the system up with 2-3 customers before.
Write the full list of customer in your csv, and run.
Make a SELECT COUNT(customer) on your final load. This is important, because you will probably want to load as many customer as possible, so to continue the process even in case of failure. This is the default behavior (on my best memory), so you won't probably notice a failure in the log if there is a large number of customer.
5. Install the job
In principle, it is just a ./kitchen.sh.
However, if you want to automate the load, you will have a hard time for checking that nothing went wrong. So open the transformation an use the System date (fixed) of the Get System Info step and write the result with the customer data. Alternatively you can get this date in the main job and pass it along the other variables.
If you have concerns about creating a new column in the database, store the list of customers loaded by day, in another table, in a file or send it to you by mail. From my experience, it's the only practical way to be able to answer to a user that claims that their biggest customer was not loaded tree weeks ago.

I run a similar scenario daily in my work. What we do is we use Batch files with named parameters for each client, this way we have the same package KJB/KTR's that run for a different client based on these parameters entirely.
What you want to do is set variables on a master job, that are used throughout the entire execution.
As for your question directly, in the connection creation tab you can use those variables in Host and DBname. Personally, we have the same user/pw set on every client DB so we don't have to change those or pass user/pw as variables for each connection, we only send the host name and database with the Named Parameters. We also have a fixed scheduled run that executes the routine for every database, for this we use a "Execute for each input row" type JOB.

Related

Unit testing SQL scripts

For below script written in .sql files:
if not exists (select * from sys.tables where name='abc_form')
CREATE TABLE abc_forms (
x BIGINT IDENTITY,
y VARCHAR(60),
PRIMARY KEY (x)
)
Above script has a bug in table name.
For programming languages like Java/C, compiler help resolve most of the name resolutions
For any SQL script, How should one approach unit testing it? static analysis...
15 years ago I did something like you request via a lot of scripting. But we had special formats for the statements.
We had three different kinds of files:
One SQL file to setup the latest version of the complete database schema
One file for all the changes to apply to older database schema's (custom format like version;SQL)
One file for SQL statements the code uses on the database (custom format like statementnumber;statement)
It was required that every statement was on one line so that it could be extracted with awk!
1) At first I set up the latest version of the database by executing from statement after the other and logging the errors to a file.
2) Secondly I did the same for all changes to have a second schema
3) I compared the two database schemas to find any differences
4) I filled in some dummy test values in the complete latest schema for testing
5) Last but not least I executed every SQL statement against the latest schema with test data and logged every error again.
At the end the whole thing runs every night and there was no morning without new errors that one of 20 developers had put into the version control. But it saved us a lot of time during the next install at a new customer.
You could also generate the SQL scripts from your code.
Code first avoids these kinds of problems. Choosing between code first or database first usually depends on whether your main focus is on your data or on your application.

Trying to delete records or reports in MS Access before a certain date

I have a Microsoft Access Database file.
I wanted to delete records older than 5 years in it.
I made a backup before starting to modify the file.
I was able to run a query and then run the command below and append it or update it to the database file.
DELETE FROM Inspections Report WHERE Date <= #01/01/2013#
I used the example:
Delete by Date In Access
The records still seem to be in there.
My desired Output:
A analogy to what I am trying to do would be the bottom left corner of a Microsoft Word file where you see page 1 of 10 when it should say page 1 of 5 after deleting pages.
DELETE Table1.*, Table1.VisitDate
FROM Table1
WHERE (((Table1.VisitDate)<=#1/1/2013#));
I suggest you make the query object and save it, so it appears in the Navigation Pane and can be tested manually. [In which case you use Query Design View and don't need the syntax above]
Then use the OpenQuery method to fire that query.
To run a sequel command from Access VBA you need to preface it with DoCmd.RunSQL or CurrentDb.Execute, and then put your SQL coding in quotes.
Also, the space is probably causing an issue - if the table you're deleting records from is called "Inspections Report" you'd enclose both those words in square brackets to show its a single entity.
Finally "Date" is a special word in Access, and it doesn't like it when you use it as a field name, as it can cause problems when referencing that field later on. You might try something like "InspectionDate".
So your code would look like this:
DoCmd.RunSQL "DELETE FROM [Inspections Report] WHERE InspectionDate <=#1/01/2013#"
If you have a static date, you'd probably only need to complete this process once, which you could just do in the table by filtering - filter for before that date, use ctrl+a to select all that match that criteria, and hit delete. It will ask if you want to delete them, and you may see that the number of records it's trying to delete is only the number that satisfies your set criteria.
Of course, if you're interested in never having records older than a certain number of years, you could go for something in the original coding like > DATEADD("yyyy", -5, DATE()) and set it to execute every time you launch the database.

SSIS User Defined Date or Default

I'm fairly new to SSIS and don't know all it's features and what tasks I can use to do things I want. I have found many Google and stackoverflow.com searches to help me get to know variables and parameters and how to set them etc.
BLUF (Bottom Line Up Front)
I have a view with data which I want to export to a file through a job that runs the package.
The data will be filtered by it's LastUpdatedDate field with datatype of DateTimeOffSet(7). The package should allow a user to run it with a specified dat or use a value from another table (SSISJobRun).
Structure
/*Employee Table*/
Id int
Name varchar(255)
LastUpdatedDate datetimeoffset(7)
/*SSISJobRun Table*/
Id int
JobTypeId int
RunDate datetimeoffset(7)
What I have Done
Currently, I'm using SSDT for VS 2015 to create my SSIS packages and deploy them to my SSIS Catalog.
I have a project with 1 package. The package contains:
a Data Flow Task named EmployeeExport; this task contains an OLE DB Source and a Flat File Destination
a package level parameter named Filename_Export (this is so that the file path can be changed when it's run by a user; the parameter has a default value configured within the Job that runs it daily
All this runs perfectly fine.
Problem
I also have set another package level parameter named LastUpdatedDate. The intent is to have who/what-ever runs the package to define a date. However, if the date is null (if I decide to use a string) or if the date is the default value 1899-12-30 00:00:00 (if I decide to use a date), I want to determine what date to use.
Specifically, if there is no real date supplied by the user, then I want to the date to be the latest RunDate. For that case I use the following code:
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
I've tried many different ways, and it works when I supply a date, but I couldn't get it to work when the date I gave was blank when I used a string or the default when I used a date.
Here's a few sources I've been looking through to figure out my issue
How to pass SSIS variables in ODBC SQLCommand expression?
How do I pass system variable value to the SQL statement in Execute SQL task?
http://mindmajix.com/ssis/how-to-execute-stored-procedure-in-ssis-execute-sql-task
.. and many more.
Once last note: this date will be used to run two tasks, so if there is a way to keep it global that would be great.
Lastly, I need to package to insert a row specifying when the the task was run into the SSISJobRun table.
Thank you.
Use a Execute SQL Task, paste
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
in the statement, and set the result to single row, in the Result page, choose the variable you set, and change the index to 0
And before the same task run the 2nd time (inside any foreach or for loop) within the same execution and does not get used anywhere within the package, this variable will remain the same value.
if you need to check, right click that Execute SQL task, Edit Breakpoints to post execution, then run the package, open watch window from Debug tab, drag and drop the variable into watch window, you should see the value.

SQL Report Service Subscription with dynamic report paramater's values

Background
I have a report with 3 parameters: AccountId, FromDate and ToDate. The report is invoice layout. Customer want to view all members which means we have 300 members, system will generates 300 reports in pdf or excel format and send to customer.
Question
How to set member id for this in subscription? I cannot do one by one and create 300 subscriptions in manually :|
If you're not clear, please comment below and I will correct it asap.
Updated:
The data-driven subscription which Manoj deal with is required SQL Report has Enterprise or Developer editon.
If I don't have, do you have any solution for work around?
If you are stuck with the Standard edition of SQL Server, you'll need to create a SSIS package to generate the reports. I used the article below to set up a package that now creates and emails our invoices, order confirmations, and shipping acknowledgments. I like this article over other ones I found because it's so easy to add more reports to it as you go along without having to create a new package each time.
If you plan to use this for more than one report, I would change the parameter to be a PK so that you know you're always going to pass in one integer regardless of which report you're calling. Another change I made was to create one table for the report generation piece, and one for the email piece. In my case, I only want to send one email that may have multiple attachments, so that was the best way to do it. In your stored proc that builds this data, make sure you have some checks for if the email is valid.
update TABLE
set SendStatus = 'Invalid Email Address'
where email NOT LIKE '%_#__%.__%' --valid email format
or patindex ('%[ &'',":;!=\/()<>]%', email) > 0 -- Invalid characters
or patindex ('[#.-_]%', email) > 0 -- Valid but cannot be starting character
or patindex ('%[#.-_]', email) > 0 -- Valid but cannot be ending character
or email not like '%#%.%' -- Must contain at least one # and one .
or email like '%..%' -- Cannot have two periods in a row
or email like '%#%#%' -- Cannot have two # anywhere
or email like '%.#%' or email like '%#.%' -- Cant have # and . next to each other
--0
When I set this up, I had a lot of issues getting the default credentials to work. The SSRS services would crash every time, but the rest of the SQL services would keep working. I ended up having our network guy make a new set with a static password just for this. If you do that, change the default credential line to this one.
rs.Credentials = new System.Net.NetworkCredential("acct", "p#ssword", "domain");
If you run into errors or issues, the SSRS logs and Google are your best friends. Or leave a comment below and I'll help.
Here's that article: http://technet.microsoft.com/en-us/library/ff793463(v=sql.105).aspx
use this link for reference:
http://beyondrelational.com/modules/2/blogs/101/posts/13460/ssrs-60-steps-to-implement-a-data-driven-subscription.aspx
create your SQL statement like this:
This will solve your problem.

Can't change data type on MS Access 2007

I have a huge database (800MB) which consists of a field called 'Date Last Modified' at the moment this field is entered as a text data type but need to change it to a Date/Time field to carry out some queries.
I have another exact same database but with only 35MB of data inside it and when I change the data type it works fine, but when I try to change data type on big database it gives me an error:
Micorosoft Office Access can't change the data type.
There isn't enough disk space or memory
After doing some research some sites mentioned of changing the registry file (MaxLocksPerFile) tried that as well, but no luck :-(
Can anyone help please?
As John W. Vinson says here, the problem you're running into is that Access wants to hold a copy of the table while it makes the changes, and that causes it to exceed the maximum allowable size of an Access file. Compacting and repairing might help get the file under the size limit, but it didn't work for me.
If, like me, you have a lot of complex relationships and reports on the old table that you don't want to have to redo, try this variation on #user292452's solution instead:
Copy the table (i.e. 'YourTable') then paste Structure Only back
into your database with a different name (i.e. 'YourTable_new').
Copy YourTable again, and paste-append the data to YourTable_new.
(To paste-append, first paste, and select Append Data to Existing
Table.)
You may want to make a copy of your Access database at this point,
just in case something goes wrong with the next part.
Delete all data in YourTable using a delete query---select all
fields, using the asterisk, and then run with default settings.
Now you can change the fields in YourTable as needed and save
again.
Paste-append the data from YourTable_new to YourTable, and check
that there were no errors from type conversion, length, etc.
Delete YourTable_new.
One relatively tedious (but straightforward) solution would be to break the big database up into smaller databases, do the conversion on the smaller databases, and then recombine them.
This has an added benefit that if, by some chance, the text is an invalid date in one chunk, it will be easier to find (because of the smaller chunk sizes).
Assuming you have some kind of integer key on the table that ranges from 1 to (say) 10000000, you can just do queries like
SELECT *
INTO newTable1
FROM yourtable
WHERE yourkey >= 0 AND yourkey < 1000000
SELECT *
INTO newTable2
FROM yourtable
WHERE yourkey >= 1000000 AND yourkey < 2000000
etc.
Make sure to enter and run these queries seperately, since it seems that Access will give you a syntax error if you try to run more than one at a time.
If your keys are something else, you can do the same kind of thing, but you'll have to be a bit more tricky about your WHERE clauses.
Of course, a final thing to consider, if you can swing it, is to migrate to a different database that has a little more power. I'm guessing you have reasons that this isn't easy, but with the amount of data you're talking about, you'll probably be running into other problems as well as you continue to use Access.
EDIT
Since you are still having some troubles, here is some more detail in the hopes that you'll see something that I didn't describe well enough before:
Here, you can see that I've created a table "OutputIDrive" similar to what you're describing. I have an ID tag, though I only have three entries.
Here, I've created a query, gone into SQL mode, and entered the appropriate SQL statement. In my case, because my query only grabs value >= 0 and < 2, we'll just get one row...the one with ID = 1.
When I click the run button, I get a popup that tells/warns me what's going to happen...it's going to put a row into a new table. That's good...that's what we're looking for. I click "OK".
Now our new table has been created, and when I click on it, we can see that our one line of data with ID = 1 has been copied over to this new table.
Now you should be able to just modify the table name and the number values in your SQL query, and run it again.
Hopefully this will help you with whatever tripped you up.
EDIT 2:
Aha! This is the trick. You have to enter and run the SQL statements one at a time in Access. If you try to put multiple statements in and run them, you'll get that error. So run the first one, then erase it and run the second one, etc. and you should be fine. I think that will do it! I've edited the above to make it clearer.
Adapted from Karl Donaubauer's answer on an MSDN post:
Switch to immediate window (Ctl + G)
Execute the following statement:
DBEngine.SetOption dbMaxLocksPerFile, 200000
Microsoft has a KnowledgeBase article that addresses this problem directly and describes the cause:
The page locks required for the transaction exceed the MaxLocksPerFile value, which defaults to 9500 locks. The MaxLocksPerFile setting is stored in the Windows registry.
The KnowledgeBase article says it applies to Access 2002 and 2003, but it worked for me when changing a field in an .mdb from Access 2013.
It's entirely possible that in a database of that size, you've got text data that won't convert to a valid Date/Time.
I would suggest (and you may hate me for this) that you export all those prospective date values from "Big" and go through them (perhaps in Excel) to see which ones are not formatted the way you'd expect.
Assuming that the error message is accurate, you're running up against a disk or memory limitation. Assuming that you have more than a couple of gigabytes free on your disk drive, my best guess is that rebuilding the table would put the database (including work space) over the 2 gigabyte per file limit in Access.
If that's the case you'll need to:
Unload the data into some convenient format and load it back in to an empty database with an already existing table definition.
Move a subset of the data into a smaller table, change the data type in the smaller table, compact and repair the database, and repeat until all the data is converted.
If the error message is NOT correct (which is possible), the most likely cause is a bad or out-of-range date in your text-date column.
Copy the table (i.e. 'YourTable') then paste just its structure back into your database with a different name (i.e. 'YourTable_new').
Change the fields in the new table to what you want and save it.
Create an append query and copy all the data from your old table into the new one.
Hopefully Access will automatically convert the old text field directly to the correct value for the new Date/Time field. If not, you might have to clear out the old table and re-append all the data and use a string to date function to convert that one field when you do the append.
Also, if there is an autonumber field in the old table this might not work because there is no way to ensure that the old autonumber values will line up with the new autonumber values that get assigned.
You've been offered a bunch of different ways to get around the disk space error message.
Have you tried adding a new field to your existing table using Date data type and then updating the field with the value the existing string date field? If that works, you can then delete the old field and rename the new one to the old name. That would probably take up less temp space than doing a direct conversion from string to date on a single field.
If it still doesn't work, you may be able to do it with a sceond table with two columns, the first long integer (make it the primary key), the second, date. Then append the PK and string date field to this empty table. Then add a new date field to the existing table, and using a join, update the new field with the values from the two-column table.
This may run into the same problem. It depends on number of things internal to the Jet/ACE database engine over which we have no real control.

Resources