Auto-generating destinations of split files in SSIS - sql-server

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?

You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.

I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.

Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

Related

How does one split large result set from a Group By into multiple flat files?

I'm far away from an SSIS expert and I'm attempting to correct an error (unspecified in the messages) that began once I modified a variable to increase the size of the data accumulated and exported into a flat file. (Note variable was a date in the WHERE statement that limited the data returned from the SELECT.)
So in the data flow there's a GROUP BY component and I'm trying to find the appropriate component to put in between that and the flat file destination component to chop up the results. I figured there'd be something to export, say flatFile1.csv, flatFile2.csv, etc. based on a number of lines (so if I set a limit of 1-million lines and the results returned 3.5-million, I'd get 4 files with the last one containing 1/2-million lines) or perhaps a max file size with similar results.
Which component should I use from the toolbox to guarantee a manageable file size?
Is a script component the only way to be able to handle any size output? If so would it sit in between the Group By and the Flat File output components or would the script completely obviate the need for the Flat File output?

SSIS Script Component - get raw row data in data flow

I am processing a flat file in SSIS and one of the requirements is that if a given row contains an incorrect number of delimiters, fail the row but continue processing the file.
My plan is to load the rows into a single column in SQL server, but during the load, I’d like to test each row during the data flow to see if it has the right number of delimiters, and add a derived column value to store the result of that comparison.
I’m thinking I could do that with a script task component, but I’m wondering if anyone has done that before and what would be the best method? If a script task component would be the way to go, how do I access the raw row with its delimiters inside the script task?
SOLUTION:
I ended up going with a modified version of Holder's answer as I found that TOKENCOUNT() will not count null values per this SO answer. When two delimiters are not separated by a value, it will result in an incorrect count (at least for my purposes).
I used the following expression instead:
LEN(EntireRow) - LEN(REPLACE(EntireRow, "|", ""))
This results in the correct count of delimiters in the row, regardless of whether there's a value in a given field or not.
My suggestion is to use Derrived Column to do your test
And then add a Conditional Split to decide if you want to insert the rows or not.
Something like this:
Use the TokenCount function in the Derrived Column box to get number of columns like this: TOKENCOUNT(EntireRow,"|")

How to Dynamically render Table name and File name in pentaho DI

I have a requirement in which one source is a table and one source is a file. I need to join these both on a column. The problem is that I can do this for one table with one transformation but I need to do it for multiple set of files and tables to load into another set of specific files as target using the same transformation.
Breaking down my requirement more specifically :
Source Table Source File Target File
VOICE_INCR_REVENUE_PROFILE_0 VoiceRevenue0 ProfileVoice0
VOICE_INCR_REVENUE_PROFILE_1 VoiceRevenue1 ProfileVoice1
VOICE_INCR_REVENUE_PROFILE_2 VoiceRevenue2 ProfileVoice2
VOICE_INCR_REVENUE_PROFILE_3 VoiceRevenue3 ProfileVoice3
VOICE_INCR_REVENUE_PROFILE_4 VoiceRevenue4 ProfileVoice4
VOICE_INCR_REVENUE_PROFILE_5 VoiceRevenue5 ProfileVoice5
VOICE_INCR_REVENUE_PROFILE_6 VoiceRevenue6 ProfileVoice6
VOICE_INCR_REVENUE_PROFILE_7 VoiceRevenue7 ProfileVoice7
VOICE_INCR_REVENUE_PROFILE_8 VoiceRevenue8 ProfileVoice8
VOICE_INCR_REVENUE_PROFILE_9 VoiceRevenue9 ProfileVoice9
The table and file names are always corresponding i.e. VOICE_INCR_REVENUE_PROFILE_0 should always join with VoiceRevenue0 and the result should be stored in ProfileVoice0. There should be no mismatches in this case. I tried setting the variables with table names and file names, but it only takes on value at a time.
All table names and file names are constant. Is there any other way to get around this. Any help would be appreciated.
Try using "Copy rows to result" step. It will store all the incoming rows (in your case the table and file names) into a memory. And for every row, it will try to execute your transformation. In this way, you can read multiple filenames at one go.
Try reading this link. Its not the exact answer, but similar.
I have created a sample here. Please check if this is what is required.
In the first transformation, i read the tablenames and filenames and loaded it in the memory. After that i have used the get variable step to read all the files and table names to generate the output. [Note: I have not used table input as source anywhere, instead used TablesNames. You can replace the same with the table input data.]
Hope it helps :)

How to retrieve the name of a file and store it in the database using SSIS package?

I'm doing an Excel loop through fifty or more Excel files. The loop goes through each Excel file, grabs all the data and inputs it into the database without error. This is the typical process of setting delay validation to true, and making sure that the expression for the Excel Connection is a string variable called EFile that is set to nothing (in the loop).
What is not working: trying to input the name of the Excel file into the database.
What's been tried (edit; SO changed my 2 to 1 - don't know why):
Add a derived column between the Excel file and database input, and add a column using the EFile expression (so under Expression in the Derived Column it would be #[User::EFile]). and add the empty. However, this inputs nothing a blank (nothing).
One suggestion was to add ANOTHER string variable and set its properties EvaluateAsExpression to True and set the Expression to the EFile variable (#[User::EFile]). The funny thing is that this does the same thing - inputs a blank into the database.
Numerous people on blogs claim they can do this, yet I haven't seen one actually address this (I have a blog and I will definitely be showing people how to do this when I get an answer because, so far, these others have fallen short). How do I grab an Excel file's name and input it in a database during a loop?
Added: Forgot to add, no scripts; the claim is that it can be done without them, so I want to see the solution without them.
Note: I already have the ability to import the data from the Excel files - that's easy (see my GitHub account, as I have two different projects for importing all sorts of txt, csv, xls, xlsx data). I am trying to also get the actual name of the file being imported also into the database. So, if there are fifty Excel files, along with the data in each file, the database will have the fifty file names alongside that data (so if each file has 1000 rows of data, each 1000 rows would also have the name of the file they came from next to them as an additional column). This point seems to cause a lot of confusion, as people assume I'm having trouble importing data in files - NOPE, see my GitHub; again that's easy. It's the FILENAME that needs to also be imported.
Test package: https://github.com/tmmtsmith/SSISLoopWithFileName
Solution: #jaimet pointed out that the Derived Column needed to be the #[User::CurrentFile] (see the test package). When I first ran the package, I still got a blank value in my database. But when we originally set up the connection, we do point it to an actual file (I call this "fooling the package"), then change the expression on the connecting later to the #[User::CurrentFile], which is blank. The Derived Column, using the variable #[User::CurrentFile], showed a string of 0. So, I removed the Derived Column, put the full file path and name in the variable, then added the variable to the Derived Column (which made it think the string was 91 characters long), then went back and set the variable to nothing (English teacher would hate the THENs about right now). When I ran the package, it inputted the full file path. Maybe, like the connection, it needs to initially think that a file exists in order for it to input the full amount of characters?
Appreciate all the help.
The issue is because of blank value in the variable #[User::FileNameInput] and this caused the SSIS package to assume that the value of this variable will always be of zero length in the Derived Column transformation.
Change the expression on the Derived column transformation from #[User::FileNameInput] to (DT_STR, 2000, 1252)#[User::FileNameInput].
Type casting the derived column to 2000 sets the column length to that maximum value. The value 1252 represents the code page. I assumed that you are using ANSI code page. I took the value 2000 from your table definition because the FilePath column had variable VARCHAR(2000). If the column data type had been NVARCHAR(2000), then the expression would be (DT_WSTR, 2000)#[User::FileNameInput]
Tim,
You're using the wrong variable in your Derived Column component. You are storing the filename in #[User::CurrentFile] but the variable that you're using in your Derived Column component is #[User::FileNameInput]
Change your Derived Column component to use #[User::CurrentFile] and you'll be good.
Hope that helps.
JT
If you are using a ForEach loop to process the files in a folder then I have have used the technique described in SSIS Junkie's blog to get the filename in to an SSIS variable: SSIS: Enumerating files in a Foreach loop
You can use the variable later in your flow to write it to the database.
TO all intents and purposes your method #1 should work. That's exactly how I would attempt to do it. I am baffled as to why it is not working. Could you perhaps share your package?
Tony, thanks very much for the link. Much appreciated.
Regards
Jamie

SSIS suitability

I'm tring to create an SSIS package to import some dataset files, however given that I seem to be hitting a brick
wall everytime I achieve a small part of the task I need to take a step back and perform a sanity check on what I'm
trying to achieve, and if you good people can advise whether SSIS is the way to go about this then I would
appreciate it.
These are my questions from this morning :-
debugging SSIS packages - debug.writeline
Changing an SSIS dts variables
What I'm trying to do is have a For..Each container enumerate over the files in a share on the SQL Server. For each
file it finds a script task runs to check various attributes of the filename, such as looking for a three letter
code, a date in CCYYMM, the name of the data contained therein, and optionally some comments. For example:-
ABC_201007_SalesData_[optional comment goes here].csv
I'm looking to parse the name using a regular expression and put the values of 'ABC', '201007', and
'SalesData' in variables.
I then want to move the file to an error folder if it doesn't meet certain criteria :-
Three character code
Six character date
Dataset name (e.g. SalesData, in this example)
CSV extension
I then want to lookup the Character code, the date (or part thereof), and the Dataset name against a lookup table
to mark off a 'checklist' of received files from each client.
Then, based on the entry in the checklist, I want to kick off another SSIS package.
So, for example I may have a table called 'Checklist' with these columns :-
Client code Dataset SSIS_Package
ABC SalesData NorthSalesData.dtsx
DEF SalesData SouthSalesData.dtsx
If anyone has a better way of achieving this I am interested in hearing about it.
Thanks in advance
That's an interesting scenario, and should be relatively easy to handle.
First, your choice of the Foreach Loop is a good one. You'll be using the Foreach File Enumerator. You can restrict the files you iterate over to be just CSVs so that you don't have to "filter" for those later.
The Foreach File Enumerator puts the filename (full path or just file name) into a variable - let's call that "FileName". There's (at least) two ways you can parse that - expressions or a Script Task. Depends which one you're more comfortable with. Either way, you'll need to create three variables to hold the "parts" of the filename - I'll call them "FileCode", "FileDate", and "FileDataset".
To do this with expressions, you need to set the EvaluateAsExpression property on FileCode, FileDate, and FileDataset to true. Then in the expressions, you need to use FINDSTRING and SUBSTRING to carve up FileName as you see fit. Expressions don't have Regex capability.
To do this in a Script Task, pass the FileName variable in as a ReadOnly variable, and the other three as ReadWrite. You can use the Regex capabilities of .Net, or just manually use IndexOf and Substring to get what you need.
Unfortunately, you have just missed the SQLLunch livemeeting on the ForEach loop: http://www.bidn.com/blogs/BradSchacht/ssis/812/sql-lunch-tomorrow
They are recording the session, however.

Resources