I've already data in my SQLServer database extracted from Sybase. I need to continue extracting information from Sybase considering the data stored in SQLServer. Basically is a condition where not in from a diffrent database.
1) If you are getting new data based on id=id, you can do lookup operation where you can cache all ids(assuming this is primary key of source data) imported already in SQL Server and do lookup.
2) Otherwise, if you are importing based on some timestamp column, you can store the maximum imported date of data which was imported in last cycle in some table and you can use that to where clause of source query dynamically to identify new rows.
Related
I am new to using ssis and am on my third package. We are taking data from Oracle into Sql Server. On my oracle table, the unique key is called recnum and is numeric(12,0). In this particular package, I am trying to take the record from oracle, lookup in a sql server table to see if that unique key is found, and if not add the record to the sql server table. My issue is it wouldn't find a match. After much testing, I came up with the following method that works. But I don't understand why I had to do this.
How I currently have it working:
I get the data from oracle. In my next step, I added a derived column that uses the oracle column. (The expression is just that field, no other formatting.) Then in the lookup I use the derived column instead of the column from Oracle.
We had already done this on another table where the unique key was numeric(8,0) and it worked ok without needing a derived column.
SSIS is very fussy about data types, lookups only work nicely if data types match.
Double click on the Data Path lines between Data Flow objects to check data types. I use Data Conversion tasks or CAST statements to force matching data types when I use lookups.
Hope this helps.
I am currently receiving a push-feed with data into a SQL Server Express database. The feed is updated randomly several times a minute (pushed into the database). I want to import the new updated records into Excel by making a call every three seconds from Excel to the SQL Server database.
I discovered that there are three methods to import the data from SQL Server into Excel:
https://www.excel-sql-server.com/excel-sql-server-import-export-using-vba.htm
import using QueryTable
import using ADO
import using Add-In
Although with these methods the complete table is imported from the database every time. I only want to import the new added records since the last import since the SQL Server database becomes very large.
I have two questions:
1) How can I import only the new added records?
2) What is the most efficient method regarding speed and system-load from the above three import methods?
Thank you!
I do not think it is correct to say Import to Excel. More correctly would be to say Export from SQL Server to Excel.
There can be few ways to do this. Will rate them from easiest/fastest to complicated/slowest:
Easiest one: You add a new column to your SQL table as a "Exported flag" and mark it for all exported columns. So, next time you will be able to export the newest records where value in that column is empty. That column can have type "BIT", "TINYINT" or even "DATETIME" to indicate when that record was exported. The problem with that method only in the case when you are not allowed to add any new columns to the table. Here is sample code to Update flag and extract values in one transaction:
UPDATE #tbl_TMP
SET Flag = 1
OUTPUT DELETED.A, DELETED.B, DELETED.C
WHERE Flag = 0
If you are not allowed to add new column you might use existing columns in the table. For instance if you have any Time stamp or Incremental value in the table you can record the latest extracted value and all records bigger than stored value will be "newest records to extract".
If table does not have Time stamp or Incremental values you can
create another table, which will contain list of already extracted
key values from the table. Then by linking these tables you can
figure out which records are new.
If table does not have key value you can store HASH of all or several columns in another table and then link by that HASH to figure out the new ones.
Create a linked server to the Excel data.
Put an INSERT trigger on the SQL Server table to insert all new rows to the spreadsheet
We have a large production MSSQL database (mdf appx. 400gb) and i have a test database. All the tables,indexes,views etc. are same eachother. I need to make sure that tha datas in the tables of this two database consistent. so i need to insert all the new rows and update all the updated rows into test db from production every night.
I came up with idea of using SSIS packages to make the data consistent by checking updated rows and new rows in all the tables. My SSIS Flow is ;
I have packages in SSIS for each tables seperately because;
Orderly;
Im getting the timestamp value in the table in order to get last 1 day rows instead of getting whole table.
I get the rows of the table in the production
Then im using 'Lookup' tool to compare this data with the test database table data.
Then im using conditional sprit to get a clue whether the data is new or updated.
If the data is new, i insert this data to the destination
5_2. If the data is updated, then i update the data in the destination table.
Data flow is in the MTRule and STBranch package in the picture
The problem is, im repeating creating all this single flow for each table and i have more than 300 table like this. It takes hours and hours :(
What im asking is;
Is there any way in SSIS to do this dynamically ?
PS: Every single table has its own columns and PK values but my data flow schema is always same. . (Below)
You can look into BiMLScript, which lets you create packages dynamically based on metadata.
I believe the best way to achieve this is to use Expressions. They empower you to dynamically set the source and Destination.
One possible solution might be as follows:
create a table which stores all your table names and PK columns
define a package which Loops through this table and which parses a SQL Statement
Call your main package and pass the stmt to it
Use the stmt as Data Source for your Data Flow
if applicable, pass the Destination Table as Parameter as well (another column in your config table)
This is how I processed several really huge tables: the data had to be fetched from 20 tables and moved to one single table.
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there.
Then call the stored procedure in a FOR EACH component in SSIS.
Why do you need to use SSIS?
You are better off writing a stored procedure that takes the tablename as parameter and doing your CRUD there. Then call the stored procedure in a FOR EACH component in SSIS.
In fact you might be able to do everything using a Stored Procedure and scheduling it in a SQL Agent Job.
I have been searching for about a week now and I was wondering if anyone may have a clue. I wrote a package to do the following:
loop through a parent folder and its subfolders for a csv with a particular naming structure (works)
Create a table for each .csv based on the enumeration of each file (works).
Import the data into sql server in their own tables with the file name that was created as the table name and not OLE DB Destination (which does not work). It works if it there is destination folder for everything, but when I use table variable that does not work.
What I did was add an Execute SQL task to the for each container to create a table with a variable for the file path that is mapped as an expression in the for each container in a create table query under property sqlstatementsource expression. The tables are created, but when I use the variable that was mapped for the for each loop as the table name or variable in OLE DB Destination I get an error asking for me to check if the table exists. The tables are created, but I cannot get the insertion of the data into their own tables. Even when I bypass the error of "Destination table has not been provided" and run the package. I set delayValidation as true and still nothing. SSIS from what I have seen so far does some cool things. However, I am stuck right now. What else am I doing wrong?
I forgot to mention that the data is going to sql server.
Thanks for everything.
You can't create an OLEDB Destination at design time with a variable for a table name. The OLEDB destination needs to know the table name, and the columns, so that it can pre-map the data flow to the table columns.
You have a couple of other options:
You can use BiML to dynamically create your dataflows and destinations.
You can use an ExecuteSQL Transformation as your dataflow destination, and write a dynamic SQL statement that inserts each row in the dataflow to the desired table.
I have an SSIS package that I want to use to update a column in a datawarehouse staging table based on the values of a surrogate key mapping table that contains the surrogate key paired with the natural key. Specifically I want to use the cache Lookup to update the fact staging table to contain the surrogate key for the inventory dimention in the same way that the following SQL would.
UPDATE A
SET A.DWHSurrogateKey = B.DWHSurrogateKey
FROM SaleStagingTable A INNER JOIN inventoryStagingTable on B.OLTPInventoryKey = A.OLTPInventoryKey
Unfortunately the nature of the data flow from Lookup transformation to destination means that it creates a whole new row, rather than updating the existing matched row. Is it possible to manipulate SSIS to do this?
Couple of constraints:
My destination is an ADO .NET destination, and we cannot use OLE DB Destinations or sources (we need to be able to use named parameters and you can't do that with OLE DB Connections)
I need to do this for multiple dimensions to link them to the fact table, so I can't just push the mapped data to new tables every time, as that becomes really messy and hard to manage
I'd like to be able to do what these guys have suggested but with ADO connectors rather than OLE DB:
http://redsouljaz.wordpress.com/2009/11/30/ssis-update-data-from-different-table-if-data-is-null/
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html
For such a simple update I would use an Execute SQL Task and save the hassle of having to mess around with a data flows. If you have lots of similar updates but with different fields and tables, I would store the column and table names in a Foreach Loop Container using a Foreach Item Enumerator, I would then add a Script Task that would take the item names and generate some dynamic SQL which could be stored in a variable, Next add the Execute SQL Task and get it to use the SQL variable.