I'm using a SSIS script task to dynamically import and create staging tables on the fly from csvs as there are so many (30+.)
For example, a table in SQL server will be created called 'Customer_03122018_1305' based on the name of the csv file. How do I then insert into the actual 'real' 'Customer' table?
Please note -there are other tables - e.g. 'OrderHead_03122018_1310' that will need to go into a 'OrderHead' table. Likewise for 'OrderLines_03122018_1405' etc.
I know how to perform the SQL insert, but the staging tables will be constantly changing based on csv date timestamp. I'm guessing this will be a script task?
I'm think of using a control table when I originally import the csv's and then lookup the real table name?
Any help would be appreciated.
Thanks.
You can follow the below process, to dynamically load all the staging tables to the main Customer table by using a FOR loop as stated below,
While creating the staging tables dynamically, store all the staging table names in a separate single variable separated by commas.
Also store the count of staging tables created in another variable.
Use FOR loop container and loop the container by the number of staging tables created.
Inside the FOR loop, use a script task and fetch the value of 1st staging table name into separate variable.
After the script task, inside FOR loop container, add a DataFlow task and inside it, build the OLEDB Source task dynamically by using the variable that is used to store the 1st staging table name in step - 4.
Load the results of from staging table to Actual table.
Remove the staging table name from the variable that is created i step - 1 (which contains all the staging table names separated by comma).
Related
I have SSIS package, in which flow is -
Get the data from flat file source and insert it into staging table.
Use the staging table data for transformation using select and where clause and then insert filtered data in destination. table.
For 1st point, I have taken Data flow task to get the data from source and insert data into staging table. For 2nd point, I am confused, how should I do it. I am using Execute SQL task to run Select-Where query but the not getting how will I insert that query result into destination table. Which SSIS component should I use here. Or shall I change the entire flow for better performance. Kindly suggest. Thanks in advance.
You are on the right track. Mostly, for a simple data import, I use this flow.
Let's say we have a destination table named FiscalYear.
The first thing I would do is create the staging table. If it exists, I drop it and recreate the table.
The next step is, using the data flow, to stage the file to the staging table.
For the last step, using Execute SQL task, and SQL-server Merge query, I insert or update the data. But to insert or update the data, you may have to have a unique identifier for each row that is in the file. This unique identifier is going to help you from inserting duplicates in case you run the package more than once.
This row unique identifier can be a single column or a combination of columns. In my case, I usually have a column named rowguid of type uniqueidentifier
I am using SSMS and cloning tables with same structure by using "script table as->create -> new query window".
My database have around 100 tables and my main task is to to perform data archiving by creating a clone table (same constraint,index,triggers,stats as old table) and importing certain data i want from the old table to new table.
My issue is inside the generated script say I want to clone table A , and in the script, there are sql scripts like { create table for table B} , {create table for table K}, etc along with their index and constraint scripts. Therefore, it makes the whole script very tedious and long.
I just want to focus on table A script so i can clone it and insert the relevant data into it . I know it has something to do with my options setting but I am unsure which options I should set to True for scripting, if i just want to clone table with same constraint,columns,indexes,triggers and stats. Does anyone know why there are unrelated script and how do i fix it ?
I am trying to create some sort of automation whereby I can generate a series of pipe-delimited text extracts for about 100 different tables each month. Each extract would be based on a simple query like this:
SELECT *
FROM tablename
WHERE AsOfDate = 'currentmonth'
where both tablename and currentmonth would be variables. The tablename variable name would change for each of the tables but currentmonth would remain the same throughout the execution.
I have been attempting to build an SSIS package that uses a ForEach Loop container that runs through a list of all the table names and passes that variable into a SQL string, which is then used by the OLE DB Data source in the data flow.
However, all of these tables have different columns. Based on what I can tell, it would not be feasible to do a simple OLE DB Source to a Flat File Destination within that loop container since the Flat File Connection Manager must be configured to account for the different columns of each table.
Would there be any feasible way to do this outside of configuring the process manually for each of the 100+ tables?
You could look into BiML which programmatically creates your dataflows based on metadata.
Or you could use a Script task that loops through the tables, loops through their columns, and generates text files instead of using any dataflow at all.
We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.
I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.