I have a stored procedure that returns huge record set upon execution.My requirement is to generate multiple CSV files via SSIS on a desired record count until it reaches the end of procedure returned records data.For example stored procedure returned 1 million records.I want to generate 10 CSV files having 100.000 records per each file.The number of CSV files generated should be based on count we chose to have on each csv file.What is the best way to achieve this via SSIS?
I did not get how loops can be used to achieve this.
The below link acted as a guide post and helped me to design a solution.I have made few changes in the implementation but the design is very helpful and nicely worked.
http://social.technet.microsoft.com/wiki/contents/articles/3172.split-a-flat-text-file-into-multiple-flat-text-files-using-ssis.aspx
Thanks to the article author.
Related
I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.
I am trying to completely automate this process, and I'm wondering if its viable or efficient to do in VBA.
Report process involves 2 files: one sql file and one excel file.
SQL file has the algorithm, and the final step is a query who's result is then pasted into the excel file.
The algorithm is simpler(than what the audience might be used to) but has two "into" commands and several "update" commands.
Two "into" commands, the first grabs a small portion(constrained on first and last day of previous month) of a 500m+ record table. The second joins the first table with an eligibility type table.
After the second table is created, there is a series of UPDATE commands that change existing data of existing columns.
Then a series of ALTER & UPDATE commands that add new columns to the [second] table and UPDATES them with desired data.
the final step is a query who's results are copy-pasted into excel (as is, no formatting changes necessary).
I'm not too well-versed in VBA/VBNET nor TSQL stored procedures and dynamic sql, if the sql algorithm was a simple pull query with no table creation, I can build something to automate that. But the SQL has 2 table creations, and about a dozen ALTER & UPDATE commands.
Am I stirring up the wrong nest? Should I run it manually as is?
You can definitely do automate this. I created a report that ran two stored procedures and created numerous queries with temp tables including both update and alter commands then used VBA to run execute these and aggregate the data in the final summary sheet.
There is a ton of documentation out there. You can even pass your values to the stored procedure after the user inputs them.
I would add this as a comment but I do not have enough reputation to comment yet (need 50).
I know this may be a simple task but I have yet to find a simple answer. I have a large sql table that I want to export into multiple flat files (.csv to be exact) that are 10,000 records each. I want to do this using SSIS and from what I gather I will need a FOREACH LOOP container. This is as far as I have got. As an added bonus, a few of the columns have commas in the data itself so when the file gets delimited by commas the data still needs to be preserved without taking out the original comma
All the videos I have come across have been using scripts or delimited by the type of data or some other way. I just want to have csv files based on a set number of records in each file. Any help is much appreciated.
I need to create an SSIS package to load data from a CSV, the tricky part is some of the columns need to be stored as rows. I better explain it with an example below.
From CSV file to Table in a different format as shown below
Is it possible with in SSIS or using SQL Server.
What you seek is called unpivot.
Please see this MSDN blog post for an example. To drop 0 values, you can use a conditional split, and push the 0 values to a garbage output.
http://blogs.msdn.com/b/dataaccesstechnologies/archive/2014/05/22/unvipot-transformation-with-a-combination-of-single-and-multiple-destination-columns.aspx
I have several CSV files and have their corresponding tables (which will have same columns as that of CSVs with appropriate datatype) in the database with the same name as the CSV. So, every CSV will have a table in the database.
I somehow need to map those all dynamically. Once I run the mapping, the data from all the csv files should be transferred to the corresponding tables.I don't want to have different mappings for every CSV.
Is this possible through informatica?
Appreciate your help.
PowerCenter does not provide such feature out-of-the-box. Unless the structures of the source files and target tables are the same, you need to define separate source/target definitions and create mappings that use them.
However, you can use Stage Mapping Generator to generate a mapping for each file automatically.
PMy understanding is you have mant CSV files with different column layouts and you need to load them into appropriate tables in the Database.
Approach 1 : If you use any RDBMS you should have have some kind of import option. Explore that route to create tables based on csv files. This is a manual task.
Approach 2: Open the csv file and write formuale using the header to generate a create tbale statement. Execute the formula result in your DB. So, you will have many tables created. Now, use informatica to read the CSV and import all the tables and load into tables.
Approach 3 : using Informatica. You need to do lot of coding to create a dynamic mapping on the fly.
Proposed Solution :
mapping 1 :
1. Read the CSV file pass the header information to a java transformation
2. The java transformation should normalize and split the header column into rows. you can write them to a text file
3. Now you have all the columns in a text file. Read this text file and use SQL transformation to create the tables on the database
Mapping 2
Now, the table is available you need to read the CSV file excluding the header and load the data into the above table via SQL transformation ( insert statement) created by mapping 1
you can follow this approach for all the CSV files. I haven't tried this solution at my end but, i am sure that the above approach would work.
If you're not using any transformations, its wise to use Import option of the database. (e.g bteq script in Teradata). But if you are doing transformations, then you have to create as many Sources and targets as the number of files you have.
On the other hand you can achieve this in one mapping.
1. Create a separate flow for every file(i.e. Source-Transformation-Target) in the single mapping.
2. Use target load plan for choosing which file gets loaded first.
3. Configure the file names and corresponding database table names in the session for that mapping.
If all the mappings (if you have to create them separately) are same, use Indirect file Method. In the session properties under mappings tab, source option.., you will get this option. Default option will be Direct change it to Indirect.
I dont hav the tool now to explore more and clearly guide you. But explore this Indirect File Load type in Informatica. I am sure that this will solve the requirement.
I have written a workflow in Informatica that does it, but some of the complex steps are handled inside the database. The workflow watches a folder for new files. Once it sees all the files that constitute a feed, it starts to process the feed. It takes a backup in a time stamped folder and then copies all the data from the files in the feed into an Oracle table. An Oracle procedure gets to work and then transfers the data from the Oracle table into their corresponding destination staging tables and finally the Data Warehouse. So if I have to add a new file or a feed, I have to make changes in configuration tables only. No changes are required either to the Informatica Objects or the db objects. So the short answer is yes this is possible but it is not an out of the box feature.