Spoon - read SQL code from txt file and execute on DB - sql-server

I'm learning to develop ETL using Pentaho Spoon, I'm pretty noob yet.
Instead of storing SQL operations inside its file, I'd like to have them on their own .sql files. It makes easier to track changes on Subversion, and in case of need I can just open the sql file on DB manager and execute it directly.
How could I do that? I suppose I could use some component to read a txt file into a variable, and another component to take that variable and execute it on DB.
How's the simplest way to achieve that?

In the standard SQL Table input, you can define the query to be a parameter ${my_query} and this parameter has to be defined (without ${...} decoration) in the transformation properties: right-click anywhere, select Properties on the popup menu, the Parameter tab.
Each time you run the transformation, you'll presented the list of parameters, among which my_query which you can overwrite.
To automatize, follow the example which was shipped with the installation zip. In the same directory as you spoon.bat/spoon.sh, there is a folder named sample, in which you will find a job to read_all_files or read all_tables. Basically this job list the files in a directory, and for each one puts it in a variable and use it as a parameter to run the transformation. Much more easier to do than to explain.

Related

Read file in SSIS Project into a variable

My SSIS projects tend to run queries that require changes as they move between environments, like the table schema might change or a value in the Where clause. I've always either put my SQL into a Project Parameter, which is hard to edit since formatting is lost, or just put it directly into the Execute SQL Task/Data Flow Source then manually edited it between migrations which is also not ideal.
I was wonder though if I added my SQL scripts to files within the project, can these be read back in? Example if I put a query like this:
select id, name from %schema%.tablename
I'd like to read this into a variable then it's easy to use an expression as I do with Project Parameters to replace %schema% with the appropriate value. Then the .sql files within the project can be edited with little effort or even tested through an Execute SQL Task that's disabled/removed before the project goes into the deployment flow. But I've been unable to find how to read in a file using a relative path within the project. Also I'm not even sure these get deployed to the SSIS Server.
Thanks for any insight.
I've added a text file query.sql to an SSIS (SQL 2017) Project in Visual Studio, bit I've found no way to pull the contents of query.sql into a variable.
Native tooling approach
For an Execute SQL Task, there's an option to source your query directly from a file.
Set your SQLSourceType to File Connection and then specify a file connection manager in the FileConnection section.
Do be aware that while this is handy, it's also ripe for someone escalating their permissions. If I had access to the file the SSIS package is looking for, I can add a drop database, create a new user and give them SA rights, etc - anything the account that runs the SSIS package can do, a nefarious person could exploit.
Roll your own approach
If you're adamant about reading the file yourself, add two Variables to your SSIS package and supply values like the following
User::QueryPath -> String -> C:\path\to\file.sql
User::QueryActual -> String -> SELECT 1;
Add a Script Task to the package. Specify as a ReadOnly variable User::QueryPath and specify as a ReadWrite variable User::QueryActual
Within the Main you'd need code like the following
string filePath = this.Dts.Variables["User::QueryPath"].Value.ToString();
this.Dts.Variables["User::QueryActual"].Value = System.IO.File.ReadAllText(filePath);
The meat of the matter is System.IO.File.ReadAllText. Note that this doesn't handle checking whether the file exists, you have permission to access, etc. It's just a barebones read of a file (and also open to the same injection challenges as the above method - just this way you own maintaining it versus the fine engineers at Microsoft)
You can build your query by using both Variable and Parameter.
For example:
Parameter A: dbo
Build your variable A (string type) as : "Select * FROM server.DB." + ParameterA + ".Table"
So if you need to change the schema, just change the parameter A will give you the corresponding query in variable A.

Create a Job for exporting a SQL Server view into Excel on a daily basis

I am using SQL Server; is there a way to create a scheduled Job which is going to take a view and export it into an Excel file every day?
With the addition of: creating a new folder named by the timestamp, and the file name will have the timestamp as part of its name as well, something like
C:/excel/221120170830/name221120170830.exl
I tried looking around but so far I couldn't find any way to do it.
Maybe I am missing something?
Yes, basically you need to combine 3 technologies:
SQL Server Agent Jobs
Powershell
Export Data Wizard/SSIS package
The idea is to create a job with as a first step, a Powershell script that checks if a folder exists and if not, creates it. The next step executes the SSIS package you have created, following the guidelines in the above link.
The tricky part may be uniquely naming your Excel file, but you first export the file to a temporary location and then, using another Powershell step, rename it and store it in the correct folder.

SSIS Package with XML Configuration, Dynamic File Location Source

I have an SSIS package that picks up a file from a directory and imports it into a database, pretty straightforward stuff. The only issue is the input file is named based on the current date, e.g \path\to\file\filename_010115.txt, where 010115 is for January 1, 2015.
I was calling this package with a custom bat file that set the connection manager for a Flat File Source to the current date formatted filename and passed it into dtexec.exe directly; however, our environment demands that we use XML configuration files and not bat files. Is there a simple way to set the file source "prefix" in the xml and append the date-formatted filename before attempting to pick up the file?
Have you considered rearchitecting your approach? You're not going to be able to gracefully do this with a Configuration - be it XML, table, environment, or registry key. In the simplest case, a table, before you could even start your SSIS package, a process would need to update that table to use the current date. A scheduling tool like SQL Agent can run SQL Commands. If you're going the XML route though, you're looking at a tiny little app or PowerShell command to modify your configuration file. Again, that's going to have to modify your file every day and before the SSIS package begins as it sets values once in the beginning and never consults the configuration source again.
You could use an Expression in SSIS to use the current date as part of your flat file source's connection string but in the event you need to process more than one day's file or reprocess yesterday's file, you're humped and either have to manually rename source files or change the system clock and nobody's going to do that.
A more canonical approach would be to use a Foreach (file) Loop Container. Point it at the source folder and find all the file(s) that match your pattern. I'm assuming here you move processed files out of the same folder. The Foreach Container finds all matching files and enumerates through the list popping the current one into whatever variable you've chosen.
See a working example on
https://stackoverflow.com/a/19957728/181965

Foreach Container to loop through Multiple Excel File to load

I have had Packages in the past where I was looping through multiple Text files in a folder and loading into sql server tables.
Now I am asked to create a package which will loop through Multiple Excel Files in a folder and load them into sql server table.
I went through the following steps to create this package assuming it shouldn't be much different from what I have in other packages where it loops through multiple Flat file.
Added an Execute Sql Task, Truncating my staging table, A simple Truncate table statement.
Added a Foreach Loop Container. Selected Foreach File Enumerator and created a variable called File_Path with data type string.
Added a Data Flow Task.
Added an Excel Data Source. and configured the Excel Connection manager By selecting any one 'Excel' File in the destination folder. (At this point is configured correctly as it is not showing any red cross or warring messages.)
Then I selected the Excel File Connection Manager and in Properties windows Under Expressions, Selected Connection String property and Used the User Variable #User::File_Path.
At this point the Excel Data source is showing a Red Cross as it needs further configuration.
I have tried a few things Like changing the Data Access Mode from Table name to Table Name or View Name Variable, And passing variable #User::File_Path but it gives me the following error.
Can someone please have a look and advice where I am going wrong and how I can fix this? Any Advice or a pointer in the right direction is much appreciated.
Thank you.
You shouldn't use an expression on the ConnectionString property, but on the ExcelFilePath property.

Is SSIS the way to achieve this functionality?

I have a manual process that needs to be automated. Below are the steps outlined. Is SSIS the correct way to achieve all the below mentioned steps? Especially the result to CSV, zip and email steps? Can this be done using inbuilt sql server scheduler?
Connect to SQL Server: DARVIN,51401
Open SQL Query: o:\Status Report.sql
Chose database: AdventureWorks
In the menu bar above choose ‘Tools’, then ‘Options’.
When the pop-up appears, choose the tab named ‘Results’ and choose Results output format of Comma Delimited (CSV). Clikd the ‘Apply’ then the ‘OK’ button.
Execute the query
Instruct where to save the file, You can save these in O:\Reports File name format is: day^_^Report TSP MM-DD-YY
Let the query run 15-25 minutes.
When the query is complete, open the folder that you save the report in, right click on the report title and compress to a zip file. (right click, Send To Compressed (zipped) Folder) When this has been completed.
Copy the saved file and put into: O:\zippedFiles\
Email: support#adventureworks.com to let them know that you
have placed a zip file at: O:\zippedFiles\
This is exacly what SSIS is for : automating data transfer processes.
The only stp that might cause you problems is the zipping part. You can use a third party library, and do a custom script task that will achieve what you want. You will have to do some Vb.Net (or C#) to achieve that part.
The rest of what you want is pretty straightforward in SSIS.

Resources