I need to create an SSIS package to load data from a CSV, the tricky part is some of the columns need to be stored as rows. I better explain it with an example below.
From CSV file to Table in a different format as shown below
Is it possible with in SSIS or using SQL Server.
What you seek is called unpivot.
Please see this MSDN blog post for an example. To drop 0 values, you can use a conditional split, and push the 0 values to a garbage output.
http://blogs.msdn.com/b/dataaccesstechnologies/archive/2014/05/22/unvipot-transformation-with-a-combination-of-single-and-multiple-destination-columns.aspx
Related
I have been searching on the internet for a solution to my problem but I can not seem to find any info. I have a large single text file ( 10 million rows), I need to create an SSIS package to load these records into different tables based on the transaction group assigned to that record. That is Tx_grp1 would go into Tx_Grp1 table, Tx_Grp2 would go into Tx_Grp2 table and so forth. There are 37 different transaction groups in the single delimited text file, records are inserted into this file as to when they actually occurred (by time). Also, each transaction group has a different number of fields
Sample data file
date|tx_grp1|field1|field2|field3
date|tx_grp2|field1|field2|field3|field4
date|tx_grp10|field1|field2
.......
Any suggestion on how to proceed would be greatly appreciated.
This task can be solved with SSIS, just with some experience. Here are the main steps and discussion:
Define a Flat file data source for your file, describing all columns. Possible problems here - different data types of fields based on tx_group value. If this is the case, I would declare all fields as strings long enough and later in the dataflow - convert its type.
Create a OLEDB Connection manager for the DB you will use to store the results.
Create a main dataflow where you will proceed the file, and add a Flat File Source.
Add a Conditional Split to the output of Flat file source, and define there as much filters and outputs as you have transaction groups.
For each transaction group data output - add Data Conversion for fields if necessary. Note - you cannot change data type of existing column, if you need to cast string to int - create a new column.
Add for each destination table an OLEDB Destination. Connect it to proper transaction group data flow, and map fields.
Basically, you are done. Test the package thoroughly on a test DB before using it on a production DB.
We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.
I have a stored procedure that returns huge record set upon execution.My requirement is to generate multiple CSV files via SSIS on a desired record count until it reaches the end of procedure returned records data.For example stored procedure returned 1 million records.I want to generate 10 CSV files having 100.000 records per each file.The number of CSV files generated should be based on count we chose to have on each csv file.What is the best way to achieve this via SSIS?
I did not get how loops can be used to achieve this.
The below link acted as a guide post and helped me to design a solution.I have made few changes in the implementation but the design is very helpful and nicely worked.
http://social.technet.microsoft.com/wiki/contents/articles/3172.split-a-flat-text-file-into-multiple-flat-text-files-using-ssis.aspx
Thanks to the article author.
I have an excel file (xlsx) containing a table :
Once I launched my ssis task (successfully) to insert data in it, it is actually append after the table :
My expected result:
So I am looking for a way to insert into the table and expand it with the data. I hope someone could help me.
I would not use SSIS for this, you may have Excel2007 as linked server , putting data into Excel by regular TSQL, or process data by Excel VBA getting data directly from SQL Server. As a matter of practical sanity, I would not ever use SSIS for anything
Well, there is not much information how you do it but you should specify somehow that first row should not be used as header names container (HDR=NO), something like,
insert into OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=D:\testing.xls; ; HDR=NO',
'SELECT * FROM [Sheet1$]')
I finally found an answer.
So I needed to generate excel reports with a lot of pivot charts linked to a main table.
But using a table was a bad idea. Instead, the pivot charts must be linked to a named range.
The last thing to know is that the error message "Invalid References" appears if the named range doesn't use the OFFSET function.
My named range formula is :
=OFFSET(Sheet!$A$1, 0, 0, COUNTA(Sheet!$A:$A), NUMBER_OF_COLUMNS)
Where Sheet is the name of the worksheet and NUMBER_OF_COLUMNS is the number of columns of the data.
That's it. I can now generate excel report without any line of code, only using SSIS 2005.
My problem is as follows. I have a CSV file (~100k rows) containting history information with the column format of:
ID1,History1,ID2,History2...ID110,History110
Each row may have anywhere between 0 and 110 history entries. Each separate entry requires a stored procedure to be called.
If there were a small number of possible entries per row, I imagine the way to do this would be to transform the data using a script, and send it to a unique path. Creating 110 paths would probably work, but isn't very elegant (and quite time consuming).
What would the best way to approach this be?
Just load the data (raw csv unchanged, one row per file line) into a staging table. Then, call a stored procedure that will use a string splitter to break up and loop over the staging table rows and call your other procedure for each history entry.
see: Arrays and Lists in SQL Server 2005 and Beyond
also see this previous answer: SQL comma delimted column => to rows then sum totals?
If you want to solve this in SSIS without the staging tables, you could create a destination script component. You could use switch statement or hashtable to lookup the right sproc to execute for the data row.
It is unclear whether this is a better solution then the staging table approach above; but it is an alternative.
I know you already accepted an answer, but couldn't you use an Unpivot task to achieve what you wanted to do here?