I need to write a batch update statement. I am able to do that. I dont have any primary key in my table. There are chances that duplicate data will be sent to database.
I want to write batch update in such a manner that It will insert only if data does not exist. When I say data does not exist, I mean 3 columns of the table which can uniquely identify a row. I don't want to make a primary key using these 3 columns.
Is there a way where we can write batch update which will insert only if data does not exist otherwise it will do the update.
I have tried merge query but could not get it.
Thanks
You can use an ItemProcessor to filter out duplicated items with a query, just return null if item is already present in database: objects that pass the processor can be written with ItemWriter and you are sure there are not duplicated
Related
I have simple mapping which deletes records in the target table. I have not used "UPDATE STRATEGY" transformation, rather set session property "delete" in order to delete records.
The said table is having composite primary keys (having 10 columns). It is working fine if all these columns are having value. BUT there are few records in which one of the column has NULL value. In this scenario, it is not deleting that record.
Can someone let me know how to handle this situation?
Its possible, because informatica fires sql like this when deleting data - delete from tab where key1=v1 and key2=v2. So, if v2 is null, its possible, delete query will ignore the record.
You can use target update override property to do this. Write your own sql to delete data.
DELETE FROM
mytable
WHERE
ID = :TU.ID
AND ISNULL(EMP_NAME,'Unspecified') = ISNULL(:TU.EMP_NAME,'Unspecified')
Since you have keys defined in infa you shouldnt face any problem. But please note that these deletes will be done on a row by row basis, so if it's a large table, and delete doesnt follow primary key index, the delete it could take time to delete each row!
I have a dynamo table called 'Table'. There are a few columns in the table, including one called 'updated'. I want to set all the 'updated' field to '0' without having to providing a key to avoid fetch and search in the table.
I tried batch write, but seems like update_item required Key inputs. How could I update the entire column to have every value as 0 efficiently please?
I am using a python script.
Thanks a lot.
At this point, you cannot do this, we have to pass a key (Partition key or Partition key and sort key) to update the item.
Currently, the only way to do this is to scan the table with filters to get all the values which have 0 in "updated" column and get respective Keys.
Pass those keys and update the value.
Hopefully, in future AWS will come up with something better.
If you can get partition key, for each partition key you can update the item
I am trying to make a simple database where one of the tables has a foreign key that references another column in that table.
I've been able to load data correctly with SQL*Loader (using csv) before adding this constraint, but once I add it, I am not able to load data with SQL*Loader (all rows get rejected).
Is there some way to resolve this? I have been searching online for a few hours, and I haven't found anything very specific. I have found examples of direct path loads, but I don't want to assume direct path loading is set up on this oracle instance (the loading I use is conventional). Is there a set of steps I can follow to successfully load this data, or is there a parameter I can set to force the load of the data?
This can be a quick workaround for the issue. Write a Unix/Bat script to do the following functions :
Assume column1 is primary key field and column2 is self referencing foreign key field.
Load the data into a temporary table without the self referencing constraints.
Now insert into the table from temporary table but only distinct values of column1 should be inserted. Column2 should be populated as null in this statement.
Update all inserted records of the table with the self referencing column2. Since all the distinct values have already been entered it will not cause self referencing key error.
I'd like to copy a table's row before updating and I'm trying to do it like this:
CREATE TRIGGER first_trigger_test
on Triggertest
FOR UPDATE
AS
insert into Triggertest select * from Inserted
Unfortunately, I get the error message
Msg 8101, Level 16, State 1, Procedure first_trigger_test, Line 6
An explicit value for the identity column in table 'Triggertest' can only be specified when a column list is used and IDENTITY_INSERT is ON.
I assume it's because of the id-column; can't I do something like 'except' id? I do not want to list all the columns in the trigger as it should be as dynamic as possible...
You can't, basically. You'll either have to specify the columns, or use a separate table:
CREATE TRIGGER first_trigger_test
on Triggertest
FOR UPDATE
AS
insert into Triggertest_audit select * from deleted
(where Triggertest_audit is a second table that looks like Triggertest, but without the primary key/identity/etc - commonly multiple rows per logical source row; not I assumed you actually wanted to copy the old values, not the new ones)
The problem happens because you are trying to set an identity column in Triggertest.
Is that your plan?
If you want to copy the new identity columns from INSERTED into Triggertest, then define the column in Triggertest without IDENTITY
If Triggertest has it's own IDENTITY columns, use this:
insert into Triggertest (col1, col2, col3) select col1, col2, col3 from Inserted
After comment:
No, you can't without dynamic SQL to detect what table and find all non-identity colums.
However, if you add or remove columns you'll then have a mis-match between trigger table and Triggertest and you'll get a different error.
If you really want it that dynamic, you'd have to concat all columns into one or use XML to ignore schema.
Finally:
Do all your tables have exactly the same number of columns and datatypes and nullability as TriggerTest... because this is the assumption here...
IF you want the table to be built each time the trigger runs then you have no choice but to use the the system tables to find the columns and create a table with those column definitions. Of course your first step will have to be to drop the existing table or the trigger won't work the second time someone updates a record.
However, I think you need to rethink this process. Dropping a table then creating a new one every time you change a record is a seriously bad idea. How is this table in anyway useful when it may get wiped out and rebuilt every second or so?
What you might consider doing instead is create a dynamic process to create the Create trigger scripts that have the correct information for that table but which are not dynamic. Then your configuration people need to run this process every time table changes are made.
Remember it is critical for triggers to do two things, run as fast as humanly possible and account for proccesing all the records inthe batch (triggers should never have row-by-row proccessing or other slow processses or assume only one row will be in inserted or deleted tables) Dynamic SQL in a trigger is porbably also a bad idea as you can't test out all the possibilites beforehand and can bring your whole production server to a screaming halt when some unexpected thing happens.
I have a table in my source DB that is self referencing
|BusinessID|...|ParentID|
This table is modeled in the DW as
|SurrogateID|BusinessID|ParentID|
First question is, should the ParentID in the DW reference the surrogate id or the business id. My idea is that it should reference the surrogate id.
Then my problem occurs, in my dataflow task of SSIS, how can I lookup the surrogate key of the parent?
If I insert all rows where ParentID is null first and then the ones that are not null I solve part of the problem.
But I still have to lookup the rows that may reference a parent that is also a child.
I.e. I do have to make sure that the parents are loaded first into the DB to be able to use the lookup transformation.
Do I have to resolve to a for-each with sorted input?
One trick I've used in this situation is to load the rows without the ParentID. I then used another data flow to create an update script based on the source data and the loaded data, then used a SQL task to run the created update script. It won't win prizes for elegance, but it does work.