How to Dynamically render Table name and File name in pentaho DI - file

I have a requirement in which one source is a table and one source is a file. I need to join these both on a column. The problem is that I can do this for one table with one transformation but I need to do it for multiple set of files and tables to load into another set of specific files as target using the same transformation.
Breaking down my requirement more specifically :
Source Table Source File Target File
VOICE_INCR_REVENUE_PROFILE_0 VoiceRevenue0 ProfileVoice0
VOICE_INCR_REVENUE_PROFILE_1 VoiceRevenue1 ProfileVoice1
VOICE_INCR_REVENUE_PROFILE_2 VoiceRevenue2 ProfileVoice2
VOICE_INCR_REVENUE_PROFILE_3 VoiceRevenue3 ProfileVoice3
VOICE_INCR_REVENUE_PROFILE_4 VoiceRevenue4 ProfileVoice4
VOICE_INCR_REVENUE_PROFILE_5 VoiceRevenue5 ProfileVoice5
VOICE_INCR_REVENUE_PROFILE_6 VoiceRevenue6 ProfileVoice6
VOICE_INCR_REVENUE_PROFILE_7 VoiceRevenue7 ProfileVoice7
VOICE_INCR_REVENUE_PROFILE_8 VoiceRevenue8 ProfileVoice8
VOICE_INCR_REVENUE_PROFILE_9 VoiceRevenue9 ProfileVoice9
The table and file names are always corresponding i.e. VOICE_INCR_REVENUE_PROFILE_0 should always join with VoiceRevenue0 and the result should be stored in ProfileVoice0. There should be no mismatches in this case. I tried setting the variables with table names and file names, but it only takes on value at a time.
All table names and file names are constant. Is there any other way to get around this. Any help would be appreciated.

Try using "Copy rows to result" step. It will store all the incoming rows (in your case the table and file names) into a memory. And for every row, it will try to execute your transformation. In this way, you can read multiple filenames at one go.
Try reading this link. Its not the exact answer, but similar.
I have created a sample here. Please check if this is what is required.
In the first transformation, i read the tablenames and filenames and loaded it in the memory. After that i have used the get variable step to read all the files and table names to generate the output. [Note: I have not used table input as source anywhere, instead used TablesNames. You can replace the same with the table input data.]
Hope it helps :)

Related

Can SSIS import TXT records in exact same order as they are in TXT file? If not (by default), then how?

I have records in a flat file that have what you might consider to be master records with detail records following the master they relate to, until the master record.
Here is an example:
Order123, Customer1, 1/1/2018
Item1, $1
Item2, $1
Order124, Customer2, 1/1/2018
Item1, $1
Item4, $2
The file has no line numbers or any kind of sequencing built in, nor does it use foreign keys to relate master to detail.
If I were to use SSIS to import the raw TXT data into a flexible table with columns designed to take various datatypes (i.e. nvarchar(255), or similar), I could iterate over the values after the import, and relate the values in Line #2 and Line #3 with Order123; and consequently, Lines #5 and #6 with Order124.
The table holding the raw data will use a simple RecordID identity column with an integer incrementing by one.
It doesn't really matter, but if you're curious, the actual data I'm referring to is the Retrosheet data event files. It's a collection of all Major League Baseball data. A real file could be downloaded from a link on this page:
https://www.retrosheet.org/game.htm
I seem to recall that you could not import TXT data into a table and expect that the order of the rows would match the order of the TXT lines. When I do small tests of this, however, the records do appear in the same order as the source file. I suspect that my small test results were too good to be true and not a fail safe prediction on how it will turn out.
In summary:
How do I use SSIS to import data, inserting SQL records in the same order as the original flat file?
The answer is yes, flat files are processed in order, as long as you aren't applying any kind of sorting.
I was able to process the Retrosheet files by creating a table in my DB that had an identity column, and a varchar column long enough to hold each line of the file (I chose 100). I then set up my flat file connection with Ragged Right formatting, defining the row delimiter as {CR}{LF}.
I just typed this out so there might be a few errors in syntax but it should get you close.
You will need to set up 2 different outputs.
Order of load will not matter as you are adding a foreign key to the detail table.
public string orderNo; /// on the OUTSIDE
public main()
string[] lines = System.IO.File.ReadAllLines([filename]);
foreach(var line in lines)
{
string[] cols = line.Split(',');
if(cols.Length == 3)
{
orderNo = cols[0];
Output0Buffer.AddRow();
Output0Buffer.OrderNo = cols[0].ToString();
Output0Buffer.Customer = cols[1].ToString();
Output0Buffer.OrderDate = DateTime.Parse(cols[2].ToString().Trim());
}
else
{
Output1Buffer.AddRow();
Output1Buffer.OrderNo = orderNo;
Output1Buffer.Item = cols[0].ToString();
Output1Buffer.Amt = cols[1].ToString(); //This needs to be parsed later.
}
}
FOLLOW UP:
I just reviewed the site you are trying to download from. And the file is more complicated than you led on by your question.
Split still seems safe to use but you will have to trim some quotewrapped strings (names) but it looks like there are no quote wrapped commas (at least in the examples). If that is the case you need to use REGEX to split.
I would change the logic to use switch and case and base it on cols[0] being one of the 8 types.
Save ID on the outside and write to each of the 7 other possible datasets this creates for linkage to parent. You will have to use the same strategy for other records that need to be tied to a different parent (I think comment is an example).
GOOD LUCK with all this. Plays do not look easy to interpret!

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

I need to extract three columns from a file

I am doing a project in Association Rule Mining. For my data set, I need to extract three columns from a text file.
Here is the link for the text file.
I need to extract Billno, Product and Batch columns and write them on to a text file.
The easiest way would be to use grep
http://www.dreamincode.net/forums/topic/290545-using-a-grep-command-in-a-c-program-in-linux/
Otherwise, that data is consistent, all the bill numbers are the same length and and appear to be easily Regex'ed
http://www.cplusplus.com/reference/regex/

How to retrieve the name of a file and store it in the database using SSIS package?

I'm doing an Excel loop through fifty or more Excel files. The loop goes through each Excel file, grabs all the data and inputs it into the database without error. This is the typical process of setting delay validation to true, and making sure that the expression for the Excel Connection is a string variable called EFile that is set to nothing (in the loop).
What is not working: trying to input the name of the Excel file into the database.
What's been tried (edit; SO changed my 2 to 1 - don't know why):
Add a derived column between the Excel file and database input, and add a column using the EFile expression (so under Expression in the Derived Column it would be #[User::EFile]). and add the empty. However, this inputs nothing a blank (nothing).
One suggestion was to add ANOTHER string variable and set its properties EvaluateAsExpression to True and set the Expression to the EFile variable (#[User::EFile]). The funny thing is that this does the same thing - inputs a blank into the database.
Numerous people on blogs claim they can do this, yet I haven't seen one actually address this (I have a blog and I will definitely be showing people how to do this when I get an answer because, so far, these others have fallen short). How do I grab an Excel file's name and input it in a database during a loop?
Added: Forgot to add, no scripts; the claim is that it can be done without them, so I want to see the solution without them.
Note: I already have the ability to import the data from the Excel files - that's easy (see my GitHub account, as I have two different projects for importing all sorts of txt, csv, xls, xlsx data). I am trying to also get the actual name of the file being imported also into the database. So, if there are fifty Excel files, along with the data in each file, the database will have the fifty file names alongside that data (so if each file has 1000 rows of data, each 1000 rows would also have the name of the file they came from next to them as an additional column). This point seems to cause a lot of confusion, as people assume I'm having trouble importing data in files - NOPE, see my GitHub; again that's easy. It's the FILENAME that needs to also be imported.
Test package: https://github.com/tmmtsmith/SSISLoopWithFileName
Solution: #jaimet pointed out that the Derived Column needed to be the #[User::CurrentFile] (see the test package). When I first ran the package, I still got a blank value in my database. But when we originally set up the connection, we do point it to an actual file (I call this "fooling the package"), then change the expression on the connecting later to the #[User::CurrentFile], which is blank. The Derived Column, using the variable #[User::CurrentFile], showed a string of 0. So, I removed the Derived Column, put the full file path and name in the variable, then added the variable to the Derived Column (which made it think the string was 91 characters long), then went back and set the variable to nothing (English teacher would hate the THENs about right now). When I ran the package, it inputted the full file path. Maybe, like the connection, it needs to initially think that a file exists in order for it to input the full amount of characters?
Appreciate all the help.
The issue is because of blank value in the variable #[User::FileNameInput] and this caused the SSIS package to assume that the value of this variable will always be of zero length in the Derived Column transformation.
Change the expression on the Derived column transformation from #[User::FileNameInput] to (DT_STR, 2000, 1252)#[User::FileNameInput].
Type casting the derived column to 2000 sets the column length to that maximum value. The value 1252 represents the code page. I assumed that you are using ANSI code page. I took the value 2000 from your table definition because the FilePath column had variable VARCHAR(2000). If the column data type had been NVARCHAR(2000), then the expression would be (DT_WSTR, 2000)#[User::FileNameInput]
Tim,
You're using the wrong variable in your Derived Column component. You are storing the filename in #[User::CurrentFile] but the variable that you're using in your Derived Column component is #[User::FileNameInput]
Change your Derived Column component to use #[User::CurrentFile] and you'll be good.
Hope that helps.
JT
If you are using a ForEach loop to process the files in a folder then I have have used the technique described in SSIS Junkie's blog to get the filename in to an SSIS variable: SSIS: Enumerating files in a Foreach loop
You can use the variable later in your flow to write it to the database.
TO all intents and purposes your method #1 should work. That's exactly how I would attempt to do it. I am baffled as to why it is not working. Could you perhaps share your package?
Tony, thanks very much for the link. Much appreciated.
Regards
Jamie

SSIS suitability

I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.

Resources