SSIS no-match lookup? SQL server integration services - prevent duplicate rows

SSIS no-match lookup? SQL server integration services - prevent duplicate rows - sql-server

In ssis 2012, let's presume I simply copy customer data from one DB Source to a DB Destination (both are different database instances, one cannot "see" the other).
How do I prevent adding customer data I already added before. In other words, when I rerun the task, it should not add the customer twice or more (only the ones that previously failed). We have a non-unique reference available in the destination customer table e.g. 'SourceCustomerID' which is non-unique!
So we cannot rely on some unique index in the Destination table(s), and if we could, I don't want go this way (would cause failures)...
Added based on questions below: there ARE columns that uniquely identify data in the target table, and we need these for this, but these are nor implemented as unique indexes, nor do I want to let the job (or rows) fail like this. I want to prevent adding these rows in a controlled way.
I tried the lookup component, playing with "Lookup No Match Output", etc...no luck yet.
Any ideas how to accomplish this using the SSIS principles??
Best regards
Bart.

Use the SCD component
https://msdn.microsoft.com/en-us/library/ms141715.aspx
You map the business key which will check for existing record and you can insert/update. You can alter it to insert only.

Related

Adding a column to a table in SQLite

I've got a table in SQLite, and it already has many rows stored in it. I know realise I need another column in the table. Up to now I've just deleted the database and started again because the data has just been test data. But now the data in the database can't be deleted.
I know the query to add a column to the table, my question is what is a good way to do this so that it works for both existing users and new users? (I have updated the CREATE query I have for when the table is not found (because it's a new user or an existing user has cleared the database). It seems wrong to have an ALTER query in software that ships, and check every time. Is there some way of telling SQLite to automatically add the column if it doesn't exist during the UPDATE query I now need?
If I discover I need more columns in the future, is having a bunch of ALTER statements on startup (or somewhere?) really the best way to do it?
(If relevant this is for a node js app)

I'd just throw a table somewhere that marks what version of your database it is, and check that to determine if an update is needed. Either that or if you have a table already where there's always going to be just one record in it add a new field 'DatabaseVersion' to it.
So for example if you check the version number, and find it's a version 1 database when the newest version should be version 3, you know which updates to perform on it.

You can use PRAGMA user_version to store the version number of the database and check if the database needs to be updated.

Use SSIS Lookup Transformation to update ADO .NET Destination table

I have an SSIS package that I want to use to update a column in a datawarehouse staging table based on the values of a surrogate key mapping table that contains the surrogate key paired with the natural key. Specifically I want to use the cache Lookup to update the fact staging table to contain the surrogate key for the inventory dimention in the same way that the following SQL would.
UPDATE A
SET A.DWHSurrogateKey = B.DWHSurrogateKey
FROM SaleStagingTable A INNER JOIN inventoryStagingTable on B.OLTPInventoryKey = A.OLTPInventoryKey
Unfortunately the nature of the data flow from Lookup transformation to destination means that it creates a whole new row, rather than updating the existing matched row. Is it possible to manipulate SSIS to do this?
Couple of constraints:
My destination is an ADO .NET destination, and we cannot use OLE DB Destinations or sources (we need to be able to use named parameters and you can't do that with OLE DB Connections)
I need to do this for multiple dimensions to link them to the fact table, so I can't just push the mapped data to new tables every time, as that becomes really messy and hard to manage
I'd like to be able to do what these guys have suggested but with ADO connectors rather than OLE DB:
http://redsouljaz.wordpress.com/2009/11/30/ssis-update-data-from-different-table-if-data-is-null/
http://www.rad.pasfu.com/index.php?/archives/46-SSIS-Upsert-With-Lookup-Transform.html

For such a simple update I would use an Execute SQL Task and save the hassle of having to mess around with a data flows. If you have lots of similar updates but with different fields and tables, I would store the column and table names in a Foreach Loop Container using a Foreach Item Enumerator, I would then add a Script Task that would take the item names and generate some dynamic SQL which could be stored in a variable, Next add the Execute SQL Task and get it to use the SQL variable.

Dynamic SQL statement return value using the current target connection

I'm currently creating my first real life project in Pervasive. The task is to map a certain XML structure containing orders (as in shops and products) to 3 tables I created myself. These tables rest inside a MS-SQL-Server instance.
All of the tables have a unique key called "id", an automatically incremented column. I've dropped this column from all mappings so that Pervasive will not try to fill it itself.
For certain calculations, for a split key in one of the tables and for references to the created records in other tables, I will need the id that the database has just created. For that, I have googled the answer. I can use "select ##identity;" as a statement, and this returns the id that has most recently been created for the current connection. This means that in Pervasive, I will have to execute this statement using the already existing target connection object.
But how to do that? I am quite sure that I will need a JDImport or DJExport object, but how to get one associated with the current connection that Pervasive inserts the records by?
Or is there any other way to handle this auto increment when I need to reference the id in other tables?

Not sure how things work in Pervasive, but you may run into issues with ##identity,. Scope_identity() would probably be safer but may still not work in Pervasive.
Hopefully your tables have a natural key in addition to the generated id, in which case you can select your id based on the natural key. This will avoid any issues you may have with disparate sessions and scope.

If there is anyone looking this post up and wonders about the answer, it's "You can't". Pervasive does not allow access to their very own connection object, the one they use to query the database. Without access to it, you cannot guaranteed fetch the right id. The solution for us was this: We used a stored procedure which we called in the Before-Transformation event that created the header record and returned the id and an optional error message as a table. We executed it and it returns the id we then save and use throughout our mapping.

Merging multiple Access databases into SQL Server

We have a program in which each user is given their own Access database. We'd like to merge these all together into a single SQL Server database.
The problem is that, using the SQL Server import/export wizard, the primary/foreign keys do not get updated. So for instance if one user has this table:
1 Apple
2 Banana
and another user has this:
1 Coconut
2 Cheeseburger
the resulting table looks like this:
1 Apple
2 Banana
1 Coconut
2 Cheeseburger
Similarly, anything that referenced Banana by its primary key (2) is now referencing both Banana and Cheeseburger, which will not make the vegans very happy.
Is there any way to automatically update the primary/foreign key references when importing, other than writing an extremely long and complex import-script?

If you need to keep them fully compartmentalized, you have to assign some kind of partitioning column to each table. Is there a reason you need your SQL Server to have the same referential integrity as Access? Are you just importing to SQL Server for read-only reporting? In that case, I would not bother with RI. The queries will all require a partitionid/siteid/customerid. You could enforce that for single-entity access by wrapping tables with a table-valued UDF which required the partitionid. For cross-site that doesn't work.
If you are just loading to SQL Server for reporting, I would also consider altering the data model to support reporting (i.e. a dimensional model is sometimes better than a normalized model) instead of worrying about transaction processing.
I think we need to know more about the underlying goals.

Need more information of requirements.
My basic question is 'Do you need to preserve the original record key?' e.g. 1:apple in table T of user-database A; 1:coconut in table T of user-database B. Table T is assumed to have the same structure in all database instances. Reasons I can suppose that you may want to preserve the original data: (a) you may have a requirement to the reference the original data (maybe a visual for previous reporting), and/or (b) there may be a data dependency in the application itself.
If the answer is 'no,' then you are probably interested only in preserving all of the distinct data values. Allow the SQL table to build using a new key and constrain the SQL table field such that it contains unique data. This approach seems to preserve the original table structure (but not the original key value or its 'location') and may suffice to meet your requirement.
If the answer is 'yes,' I do not see a way around creating an index that preserves a pointer to the original database and the key that was created in its table T. This approach would seem to require an application modification.
The best approach in this case is probably to split the incoming data into two tables: one to identify the database and original key, another to identify the distinct data values. For example: (database) table D has records such as 'A:1:a,' 'A:2:b,' 'B:1:c,' 'B:2:d,' 'B:15:a,' 'C:8:a'; (data) table T1 has records such as 'a:apple,' 'b:banana,' 'c:coconut,' 'd:cheeseburger' where 'A' describes the original database 'location,' 1 is the original value in location 'A,' and 'a' is a value that equates records in table D and table T1. (Otherwise you have a lot of redundant data in the one table; e.g. A:1:apple, B:15:apple, C:8:apple.) Also, T1 has a structure similar to the original T and is seems to be more directly useful in the application.

Ended up creating an SSIS project for this. SSIS is a visual programming tool made by Microsoft (and part of their "Business Integration Studio", which comes with SQL Server) designed for solving exactly these sorts of problems.

Why not let Access use its replication manager to merge the databases? This will allow you to identify the conflicts and resolve them before importing to SQL Server. I'm fairly confident it will retain the foreign key relationships. If I understand your situation correctly, and the databases are the same structure with different data, you could load the combined database to the application and verify the data before moving to SQL Server.
What version of Access are you using? Here's a link for Access 2000. Use the language to adjust search parameters to fit your version.
http://technet.microsoft.com/en-us/library/cc751054.aspx

Integration Service: Best way to compare two Tables to retrieve missing Keys for use in later Decisions

I am currently trying to develop a SSIS Package and am lost because I am a no0b. I have two tables, one is updated and one needs to be updated. I need to compare these two tables and find the primary keys that have been added to the first table and that are not present in the second table.
I need these primary keys in the next set of queries which would use these to determine what is returned (WHERE). Is this possible with SSIS? If so, which toolbox items should I be concentrated on?

This is a common task and I would do the following:
(1) Within a Data Flow Component drag over your OLE DB Source (probably already have that).
(2) Connect the source component with a Lookup Component. Within that lookup choose the other table in which you wish to compare to and in the column tabs (on the left) match up the primary key columns and check any column you'd like to be returned.
Lastly, go into the Error Handling section within that same Lookup component (on the left as well) and choose Redirect on Failure.
(3) Now when you choose your next component (whatever that may be) you will pick the red flow connector arrow and that will be the nonmatches.
Make sense? Likewise, you can still use the matched flow by connecting the green flow arrow to yet another component. Hope this helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight