Using MERGE to delete data, or insert

Using MERGE to delete data, or insert - sql-server

From the UI I pass a datatable to a stored procedure. The type of that parameter is a user defined table field with the following structure
Personkey int
ComponentKey varchar
This data needs to go into a table, and data that exists in the table but is not in the datatable should be deleted.
Example table data
PersonKey ComponentKey
123 A1
456 B9
And my datatable has 2 rows, one matching row and one new row
Example datatable data
PersonKey ComponentKey
123 A1
786 Z6
The result is that the 456/B9 row should be deleted, nothing should happen to the 123/A1 row, and the 786/Z6 row should be inserted.
I believe I can use the MERGE statement but I am not sure how to form it.
I understand that WHEN NOT MATCHED I should do the insert but where does the delete part come into it?
MERGE Components
USING #passedInData
ON PersonKey = DatatblPersonKey AND ComponentKey = DatatblComponentKey
WHEN MATCHED THEN
-- DO nothing...
WHEN NOT MATCHED
INSERT (PersonKey, ComponentKey) VALUES (DatatblPersonKey, DatatblComponentey);
Edit: Just to be clear, the datatable could contain many rows for the same person key, but the component key would be different.
Example datatable data
PersonKey ComponentKey
123 Z6
123 C5
Example table data
PersonKey ComponentKey
123 A1
456 B9
The result after inserting the above datatable should be
PersonKey ComponentKey
123 Z6
123 C5
456 B9
Notice that 123/A1 has been deleted and 456/B9 is still in the table.

The default "WHEN NOT MATCHED" assumes that what you really mean is "WHEN NOT MATCHED BY TARGET". You can do another statement for "WHEN NOT MATCHED BY SOURCE" with the simple command "DELETE".
Be careful when you do this because it will delete all the records from the target that don't match the source based on the comparison you have specified. If it's necessary to do a subset of the target for that action, you can use a cte with that filter and then do your merge against that cte as the target.
edit ... demonstrating how to hook up what I am saying:
DECLARE #databaseTable TABLE (PersonKey INT, ComponentKey VARCHAR(10));
INSERT INTO #databaseTable
VALUES
(123, 'A1'),
(456, 'B9');
DECLARE #appDataset TABLE (PersonKey INT, ComponentKey VARCHAR(10));
INSERT INTO #appDataset
VALUES
(123, 'Z6'),
(123, 'C5');
WITH cteTarget AS
(
SELECT dt.PersonKey
, dt.ComponentKey
FROM #databaseTable AS dt
JOIN (SELECT DISTINCT PersonKey FROM #appDataset) AS pk
ON pk.PersonKey = dt.PersonKey
)
MERGE cteTarget AS tgt
USING #appDataset AS src
ON src.PersonKey = tgt.PersonKey
AND src.ComponentKey = tgt.ComponentKey
WHEN NOT MATCHED BY SOURCE THEN
DELETE
WHEN NOT MATCHED BY TARGET THEN
INSERT
(PersonKey
,ComponentKey)
VALUES
(src.PersonKey
,src.ComponentKey);
SELECT * FROM #databaseTable;

Related

SQL- use an attribute to group activities and use the group as parameter

I have a table that looks like this:
ActivityID
Time Used
Activity Type
Activity Category ID
Activity Category
123456
30
A
1
X
765432
120
B
2
Y
876462
65
C
3
Z
h52635
76
D
3
Z
hsgs62
187
E
1
X
I would like to use the Activity Category as parameter (#ActivityCategory) to filter my report later, it means the filter should be X;Y;Z.
When I choose one Activity Category, the sum of "Time used" should appear.
My question is: how should I build the query, to be able to group the activities with the same Activity Category together and use the Category XYZ as a parameter?

Something like this perhaps:
-- Sample data
DECLARE #table TABLE (ActivityId INT, TimeUsed INT, ActivityCategory CHAR(1));
INSERT #table VALUES(123,20,'X'), (129,50,'Y'), (254,30,'Y'), (991,10,'Z');
-- Parameter
DECLARE #ActivityCategory VARCHAR(100) = 'X,Y';
SELECT t.ActivityCategory, TimeUsed = SUM(t.TimeUsed)
FROM #table AS t
CROSS APPLY STRING_SPLIT(#ActivityCategory,',') AS s -- You will need a string splitter funciton
WHERE t.ActivityCategory = s.value
GROUP BY t.ActivityCategory;
Returns:
ActivityCategory TimeUsed
---------------- -----------
X 20
Y 80

Alan's answer is good, but I'd personally use a temp table and a join for performance reasons. The table being queried might be very large, in which case a join to a temp table would be more performant than CROSS APPLY.
The easiest way to pass multi-value parameters in and out of your query are comma-separated lists. Indeed if you are using Report Server / SSRS then that is how the "Multiple Value" box in the user interface will deliver the users' selections into a varchar parameter.
--Declare and set parameter
DECLARE #ActivityCategories varchar(MAX)
SET #ActivityCategories = 'X,Y,Z'
--Convert individual parameter values to a temp table
DROP TABLE IF EXISTS #ParamaterValues
CREATE TABLE #ParameterValues (ActivityCategory varchar(10) NOT NULL PRIMARY KEY CLUSTERED)
INSERT INTO #ParameterValues WITH(TABLOCK)
SELECT value
FROM STRING_SPLIT(#ActivityCategories,',')
GROUP BY value
ORDER BY value
--Join on temp table to filter by paramater values
SELECT ActivityID,
TimeUsed,
ActivityType,
ActivityCategoryID,
ActivityCategory
FROM dbo.YourTable a
INNER JOIN #ParameterValues b ON a.ActivityCategory = b.ActivityCategory

How to update table based on a json map?

If I have this table
CREATE TABLE tmp (
a integer,
b integer,
c text
);
INSERT INTO tmp (a, b, c) VALUES (1, 2, 'foo');
And this json:
{
"a": 4,
"c": "bar"
}
Where the keys map to the column names, and the values are the new values.
How can I update the tmp table without touching columns that aren't in the map?
I thought about constructing a dynamic string of SQL update statement that can be executed in pl/pgsql, but it seems the number of arguments that get passed to USING must be predetermined. But the actual number of arguments is determined by the number of keys in the map, which is dynamic, so this seems like a dead end.
I know I can update the table using multiple update statements as I loop over the keys, but the problem is that I have a trigger set up for the table that will revision the table (by inserting changed columns into another table), so the columns must be updated in a single update statement.
I wonder if it's possible to dynamically update a table with a json map?

Use coalesce(). Example table:
drop table if exists my_table;
create table my_table(id int primary key, a int, b text, c date);
insert into my_table values (1, 1, 'old text', '2017-01-01');
and query:
with jsondata(jdata) as (
values ('{"id": 1, "b": "new text"}'::jsonb)
)
update my_table set
a = coalesce((jdata->>'a')::int, a),
b = coalesce((jdata->>'b')::text, b),
c = coalesce((jdata->>'c')::date, c)
from jsondata
where id = (jdata->>'id')::int;
select * from my_table;
id | a | b | c
----+---+----------+------------
1 | 1 | new text | 2017-01-01
(1 row)

Update Or Skip Rows with duplicate Primary Key While Bulk Insert - SQL

I'm using Bulk Insert Method to insert rows from CSV files. But it will fail on duplicate primary keys.
Here is my Sample Code:
Use People
Go
BULK
INSERT tblProfile
FROM 'F:\People.txt'
WITH
(
DATAFILETYPE='widechar',
CODEPAGE = 'ACP',
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'F:\ErrorRows.csv'
)
GO
I need to update fields on duplicate primary keys' rows.
For example here is a sample of my table:
Code Name Family City
---------------------------
45 Joe Stone USA
67 Sara Stone USA
68 Stone
if there is a row with code "68" in CSV file and in this row we have name or city (which is empty or null in my table) then bulk insert update and fill them otherwise skip this duplication on primary key and do the insert for others.
Is something like this possible?

As DoctorMick stated here
You could set the MAXERRORS property to quite a high which will allow the valid records to be inserted and the duplicates to be ignored. Unfortunately, this will mean that any other errors in the dataset won't cause the load to fail.
Alternatively, you could set the BATCHSIZE property which will load the data in multiple transactions therefore if there are duplicates it will only roll back the batch.
Or
Use Temp table to filter the Duplicate and Update it
INSERT INTO #tblProfile(Id, Col1) -- temporary table
VALUES
(3, S3),
(4, S4),
(5, S5)
INSERT INTO tblProfile
SELECT * FROM #tblProfile
WHERE NOT EXISTS (SELECT Id FROM #tblProfile WHERE #tblProfile.Id = tblProfile.id)
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY id
ORDER BY ( SELECT 0)) RN
FROM #tblProfile)
DELETE FROM cte
WHERE RN > 1
Update T
SET T.Col1 = ISNULL(T1.Col1,T.Col1)
FROM tblProfile T join #tblProfile T1 ON T.id =T1.id

Not allowing column values other than what is found in other table

I have 2 tables
Table A
Column A1 Column A2 and
Table B
Column B1 Column B2
Column A1 is not unique and not the PK, but I want to put a constraint on column B1 that it cannot have values other than what is found in Column A1, can it be done?

It cannot be done using FK. Instead you can use a check constraint to see if B value is available in A.
Example:
alter table TableB add constraint CK_BValueCheck check dbo.fn_ValidateBValue(B1) = 1
create function dbo.fn_ValidateBValue(B1 int)
returns bit as
begin
declare #ValueExists bit
select #ValueExists = 0
if exists (select 1 from TableA where A1 = B1)
select #ValueExists = 1
return #ValueExists
end

You can not have dynamic constraint to limit the values in Table B. Instead you can either have trigger on TableB or you need to limit all inserts or updates on TbaleB to select values from Column A only:
Insert into TableB
Select Col from Table where Col in(Select ColumnA from TableA)
or
Update TableB
Set ColumnB= <somevalue>
where <somevalue> in(Select columnA from TableA)
Also, I would add its a very design practice and can not guarantee accuracy all the time.

Long way around but you could add an identity to A and declare the PK as iden, A1.
In B iden would just be an integer (not identity).
You asked for any other ways.
Could create a 3rd table that is a FK used by both but that does not assure B1 is in A.

Here's the design I'd go with, if I'm free to create tables and triggers in the database, and still want TableA to allow multiple A1 values. I'd introduce a new table:
create table TableA (ID int not null,A1 int not null)
go
create table UniqueAs (
A1 int not null primary key,
Cnt int not null
)
go
create trigger T_TableA_MaintainAs
on TableA
after insert, update, delete
as
set nocount on
;With UniqueCounts as (
select A1,COUNT(*) as Cnt from inserted group by A1
union all
select A1,COUNT(*) * -1 from deleted group by A1
), CombinedCounts as (
select A1,SUM(Cnt) as Cnt from UniqueCounts group by A1
)
merge into UniqueAs a
using CombinedCounts cc
on
a.A1 = cc.A1
when matched and a.Cnt = -cc.Cnt then delete
when matched then update set Cnt = a.Cnt + cc.Cnt
when not matched then insert (A1,Cnt) values (cc.A1,cc.Cnt);
And test it out:
insert into TableA (ID,A1) values (1,1),(2,1),(3,2)
go
update TableA set A1 = 2 where ID = 1
go
delete from TableA where ID = 2
go
select * from UniqueAs
Result:
A1 Cnt
----------- -----------
2 2
Now we can use a genuine foreign key from TableB to UniqueAs. This should all be relatively efficient - the usual FK mechanisms are available between TableB and UniqueAs, and the maintenance of this table is always by PK reference - and we don't have to needlessly rescan all of TableA - we just use the trigger pseudo-tables.

SQLite UPSERT / UPDATE OR INSERT

I need to perform UPSERT / INSERT OR UPDATE against a SQLite Database.
There is the command INSERT OR REPLACE which in many cases can be useful. But if you want to keep your id's with autoincrement in place because of foreign keys, it does not work since it deletes the row, creates a new one and consequently this new row has a new ID.
This would be the table:
players - (primary key on id, user_name unique)
| id | user_name | age |
------------------------------
| 1982 | johnny | 23 |
| 1983 | steven | 29 |
| 1984 | pepee | 40 |

Q&A Style
Well, after researching and fighting with the problem for hours, I found out that there are two ways to accomplish this, depending on the structure of your table and if you have foreign keys restrictions activated to maintain integrity. I'd like to share this in a clean format to save some time to the people that may be in my situation.
Option 1: You can afford deleting the row
In other words, you don't have foreign key, or if you have them, your SQLite engine is configured so that there no are integrity exceptions. The way to go is INSERT OR REPLACE. If you are trying to insert/update a player whose ID already exists, the SQLite engine will delete that row and insert the data you are providing. Now the question comes: what to do to keep the old ID associated?
Let's say we want to UPSERT with the data user_name='steven' and age=32.
Look at this code:
INSERT INTO players (id, name, age)
VALUES (
coalesce((select id from players where user_name='steven'),
(select max(id) from drawings) + 1),
32)
The trick is in coalesce. It returns the id of the user 'steven' if any, and otherwise, it returns a new fresh id.
Option 2: You cannot afford deleting the row
After monkeying around with the previous solution, I realized that in my case that could end up destroying data, since this ID works as a foreign key for other table. Besides, I created the table with the clause ON DELETE CASCADE, which would mean that it'd delete data silently. Dangerous.
So, I first thought of a IF clause, but SQLite only has CASE. And this CASE can't be used (or at least I did not manage it) to perform one UPDATE query if EXISTS(select id from players where user_name='steven'), and INSERT if it didn't. No go.
And then, finally I used the brute force, with success. The logic is, for each UPSERT that you want to perform, first execute a INSERT OR IGNORE to make sure there is a row with our user, and then execute an UPDATE query with exactly the same data you tried to insert.
Same data as before: user_name='steven' and age=32.
-- make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
-- make sure it has the right data
UPDATE players SET user_name='steven', age=32 WHERE user_name='steven';
And that's all!
EDIT
As Andy has commented, trying to insert first and then update may lead to firing triggers more often than expected. This is not in my opinion a data safety issue, but it is true that firing unnecessary events makes little sense. Therefore, a improved solution would be:
-- Try to update any existing row
UPDATE players SET age=32 WHERE user_name='steven';
-- Make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);

This is a late answer. Starting from SQLIte 3.24.0, released on June 4, 2018, there is finally a support for UPSERT clause following PostgreSQL syntax.
INSERT INTO players (user_name, age)
VALUES('steven', 32)
ON CONFLICT(user_name)
DO UPDATE SET age=excluded.age;
Note: For those having to use a version of SQLite earlier than 3.24.0, please reference this answer below (posted by me, #MarqueIV).
However if you do have the option to upgrade, you are strongly encouraged to do so as unlike my solution, the one posted here achieves the desired behavior in a single statement. Plus you get all the other features, improvements and bug fixes that usually come with a more recent release.

Here's an approach that doesn't require the brute-force 'ignore' which would only work if there was a key violation. This way works based on any conditions you specify in the update.
Try this...
-- Try to update any existing row
UPDATE players
SET age=32
WHERE user_name='steven';
-- If no update happened (i.e. the row didn't exist) then insert one
INSERT INTO players (user_name, age)
SELECT 'steven', 32
WHERE (Select Changes() = 0);
How It Works
The 'magic sauce' here is using Changes() in the Where clause. Changes() represents the number of rows affected by the last operation, which in this case is the update.
In the above example, if there are no changes from the update (i.e. the record doesn't exist) then Changes() = 0 so the Where clause in the Insert statement evaluates to true and a new row is inserted with the specified data.
If the Update did update an existing row, then Changes() = 1 (or more accurately, not zero if more than one row was updated), so the 'Where' clause in the Insert now evaluates to false and thus no insert will take place.
The beauty of this is there's no brute-force needed, nor unnecessarily deleting, then re-inserting data which may result in messing up downstream keys in foreign-key relationships.
Additionally, since it's just a standard Where clause, it can be based on anything you define, not just key violations. Likewise, you can use Changes() in combination with anything else you want/need anywhere expressions are allowed.

The problem with all presented answers it complete lack of taking triggers (and probably other side effects) into account.
Solution like
INSERT OR IGNORE ...
UPDATE ...
leads to both triggers executed (for insert and then for update) when row does not exist.
Proper solution is
UPDATE OR IGNORE ...
INSERT OR IGNORE ...
in that case only one statement is executed (when row exists or not).

To have a pure UPSERT with no holes (for programmers) that don't relay on unique and other keys:
UPDATE players SET user_name="gil", age=32 WHERE user_name='george';
SELECT changes();
SELECT changes() will return the number of updates done in the last inquire.
Then check if return value from changes() is 0, if so execute:
INSERT INTO players (user_name, age) VALUES ('gil', 32);

Option 1: Insert -> Update
If you like to avoid both changes()=0 and INSERT OR IGNORE even if you cannot afford deleting the row - You can use this logic;
First, insert (if not exists) and then update by filtering with the unique key.
Example
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Insert if NOT exists
INSERT INTO players (user_name, age)
SELECT 'johnny', 20
WHERE NOT EXISTS (SELECT 1 FROM players WHERE user_name='johnny' AND age=20);
-- Update (will affect row, only if found)
-- no point to update user_name to 'johnny' since it's unique, and we filter by it as well
UPDATE players
SET age=20
WHERE user_name='johnny';
Regarding Triggers
Notice: I haven't tested it to see the which triggers are being called, but I assume the following:
if row does not exists
BEFORE INSERT
INSERT using INSTEAD OF
AFTER INSERT
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
if row does exists
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
Option 2: Insert or replace - keep your own ID
in this way you can have a single SQL command
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Single command to insert or update
INSERT OR REPLACE INTO players
(id, user_name, age)
VALUES ((SELECT id from players WHERE user_name='johnny' AND age=20),
'johnny',
20);
Edit: added option 2.

You can also just add an ON CONFLICT REPLACE clause to your user_name unique constraint and then just INSERT away, leaving it to SQLite to figure out what to do in case of a conflict. See:https://sqlite.org/lang_conflict.html.
Also note the sentence regarding delete triggers: When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.

For those who have the latest version of sqlite available, you can still do it in a single statement using INSERT OR REPLACE but beware you need to set all the values. However this "clever" SQL works by use of a left-join on the table into which you are inserting / updating and ifnull:
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( id varchar(20) PRIMARY KEY, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
The first few lines of that code are setting up the table, with a single ID primary key column and two values. It then enters data with IDs 'A' and 'B'
The second section creates the 'upsert' text, and calls it for 2 rows of data, one with the ID of 'B' which is found and one with 'C' which is not found.
When you run it, you'll find the data at the end produces
$python3 main.py
[('A', 1, 8), ('B', 3, 10)]
[('A', 1, 8), ('B', 4, 10), ('C', 3, 12)]
B 'updated' value to 4 but value2 (5) was ignored, C inserted.
Note: this does not work if your table has an auto-incremented primary key as INSERT OR REPLACE will replace the number with a new one.
A slight modification to add such a column
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( pkey integer primary key autoincrement not null, id varchar(20) UNIQUE not null, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test (id, value, value2)
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
output is now:
$python3 main.py
[(1, 'A', 1, 8), (2, 'B', 3, 10)]
[(1, 'A', 1, 8), (3, 'B', 4, 10), (4, 'C', 3, 12)]
Note pkey 2 is replaced with 3 for id 'B'
This is therefore not ideal but is a good solution when:
You don't have an auto-generated primary key
You want to create an 'upsert' query with bound parameters
You want to use executemany() to merge in multiple rows of data in one go.