SQLite UPSERT / UPDATE OR INSERT - database

I need to perform UPSERT / INSERT OR UPDATE against a SQLite Database.
There is the command INSERT OR REPLACE which in many cases can be useful. But if you want to keep your id's with autoincrement in place because of foreign keys, it does not work since it deletes the row, creates a new one and consequently this new row has a new ID.
This would be the table:
players - (primary key on id, user_name unique)
| id | user_name | age |
------------------------------
| 1982 | johnny | 23 |
| 1983 | steven | 29 |
| 1984 | pepee | 40 |

Q&A Style
Well, after researching and fighting with the problem for hours, I found out that there are two ways to accomplish this, depending on the structure of your table and if you have foreign keys restrictions activated to maintain integrity. I'd like to share this in a clean format to save some time to the people that may be in my situation.
Option 1: You can afford deleting the row
In other words, you don't have foreign key, or if you have them, your SQLite engine is configured so that there no are integrity exceptions. The way to go is INSERT OR REPLACE. If you are trying to insert/update a player whose ID already exists, the SQLite engine will delete that row and insert the data you are providing. Now the question comes: what to do to keep the old ID associated?
Let's say we want to UPSERT with the data user_name='steven' and age=32.
Look at this code:
INSERT INTO players (id, name, age)
VALUES (
coalesce((select id from players where user_name='steven'),
(select max(id) from drawings) + 1),
32)
The trick is in coalesce. It returns the id of the user 'steven' if any, and otherwise, it returns a new fresh id.
Option 2: You cannot afford deleting the row
After monkeying around with the previous solution, I realized that in my case that could end up destroying data, since this ID works as a foreign key for other table. Besides, I created the table with the clause ON DELETE CASCADE, which would mean that it'd delete data silently. Dangerous.
So, I first thought of a IF clause, but SQLite only has CASE. And this CASE can't be used (or at least I did not manage it) to perform one UPDATE query if EXISTS(select id from players where user_name='steven'), and INSERT if it didn't. No go.
And then, finally I used the brute force, with success. The logic is, for each UPSERT that you want to perform, first execute a INSERT OR IGNORE to make sure there is a row with our user, and then execute an UPDATE query with exactly the same data you tried to insert.
Same data as before: user_name='steven' and age=32.
-- make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);
-- make sure it has the right data
UPDATE players SET user_name='steven', age=32 WHERE user_name='steven';
And that's all!
EDIT
As Andy has commented, trying to insert first and then update may lead to firing triggers more often than expected. This is not in my opinion a data safety issue, but it is true that firing unnecessary events makes little sense. Therefore, a improved solution would be:
-- Try to update any existing row
UPDATE players SET age=32 WHERE user_name='steven';
-- Make sure it exists
INSERT OR IGNORE INTO players (user_name, age) VALUES ('steven', 32);

This is a late answer. Starting from SQLIte 3.24.0, released on June 4, 2018, there is finally a support for UPSERT clause following PostgreSQL syntax.
INSERT INTO players (user_name, age)
VALUES('steven', 32)
ON CONFLICT(user_name)
DO UPDATE SET age=excluded.age;
Note: For those having to use a version of SQLite earlier than 3.24.0, please reference this answer below (posted by me, #MarqueIV).
However if you do have the option to upgrade, you are strongly encouraged to do so as unlike my solution, the one posted here achieves the desired behavior in a single statement. Plus you get all the other features, improvements and bug fixes that usually come with a more recent release.

Here's an approach that doesn't require the brute-force 'ignore' which would only work if there was a key violation. This way works based on any conditions you specify in the update.
Try this...
-- Try to update any existing row
UPDATE players
SET age=32
WHERE user_name='steven';
-- If no update happened (i.e. the row didn't exist) then insert one
INSERT INTO players (user_name, age)
SELECT 'steven', 32
WHERE (Select Changes() = 0);
How It Works
The 'magic sauce' here is using Changes() in the Where clause. Changes() represents the number of rows affected by the last operation, which in this case is the update.
In the above example, if there are no changes from the update (i.e. the record doesn't exist) then Changes() = 0 so the Where clause in the Insert statement evaluates to true and a new row is inserted with the specified data.
If the Update did update an existing row, then Changes() = 1 (or more accurately, not zero if more than one row was updated), so the 'Where' clause in the Insert now evaluates to false and thus no insert will take place.
The beauty of this is there's no brute-force needed, nor unnecessarily deleting, then re-inserting data which may result in messing up downstream keys in foreign-key relationships.
Additionally, since it's just a standard Where clause, it can be based on anything you define, not just key violations. Likewise, you can use Changes() in combination with anything else you want/need anywhere expressions are allowed.

The problem with all presented answers it complete lack of taking triggers (and probably other side effects) into account.
Solution like
INSERT OR IGNORE ...
UPDATE ...
leads to both triggers executed (for insert and then for update) when row does not exist.
Proper solution is
UPDATE OR IGNORE ...
INSERT OR IGNORE ...
in that case only one statement is executed (when row exists or not).

To have a pure UPSERT with no holes (for programmers) that don't relay on unique and other keys:
UPDATE players SET user_name="gil", age=32 WHERE user_name='george';
SELECT changes();
SELECT changes() will return the number of updates done in the last inquire.
Then check if return value from changes() is 0, if so execute:
INSERT INTO players (user_name, age) VALUES ('gil', 32);

Option 1: Insert -> Update
If you like to avoid both changes()=0 and INSERT OR IGNORE even if you cannot afford deleting the row - You can use this logic;
First, insert (if not exists) and then update by filtering with the unique key.
Example
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Insert if NOT exists
INSERT INTO players (user_name, age)
SELECT 'johnny', 20
WHERE NOT EXISTS (SELECT 1 FROM players WHERE user_name='johnny' AND age=20);
-- Update (will affect row, only if found)
-- no point to update user_name to 'johnny' since it's unique, and we filter by it as well
UPDATE players
SET age=20
WHERE user_name='johnny';
Regarding Triggers
Notice: I haven't tested it to see the which triggers are being called, but I assume the following:
if row does not exists
BEFORE INSERT
INSERT using INSTEAD OF
AFTER INSERT
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
if row does exists
BEFORE UPDATE
UPDATE using INSTEAD OF
AFTER UPDATE
Option 2: Insert or replace - keep your own ID
in this way you can have a single SQL command
-- Table structure
CREATE TABLE players (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name VARCHAR (255) NOT NULL
UNIQUE,
age INTEGER NOT NULL
);
-- Single command to insert or update
INSERT OR REPLACE INTO players
(id, user_name, age)
VALUES ((SELECT id from players WHERE user_name='johnny' AND age=20),
'johnny',
20);
Edit: added option 2.

You can also just add an ON CONFLICT REPLACE clause to your user_name unique constraint and then just INSERT away, leaving it to SQLite to figure out what to do in case of a conflict. See:https://sqlite.org/lang_conflict.html.
Also note the sentence regarding delete triggers: When the REPLACE conflict resolution strategy deletes rows in order to satisfy a constraint, delete triggers fire if and only if recursive triggers are enabled.

For those who have the latest version of sqlite available, you can still do it in a single statement using INSERT OR REPLACE but beware you need to set all the values. However this "clever" SQL works by use of a left-join on the table into which you are inserting / updating and ifnull:
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( id varchar(20) PRIMARY KEY, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
The first few lines of that code are setting up the table, with a single ID primary key column and two values. It then enters data with IDs 'A' and 'B'
The second section creates the 'upsert' text, and calls it for 2 rows of data, one with the ID of 'B' which is found and one with 'C' which is not found.
When you run it, you'll find the data at the end produces
$python3 main.py
[('A', 1, 8), ('B', 3, 10)]
[('A', 1, 8), ('B', 4, 10), ('C', 3, 12)]
B 'updated' value to 4 but value2 (5) was ignored, C inserted.
Note: this does not work if your table has an auto-incremented primary key as INSERT OR REPLACE will replace the number with a new one.
A slight modification to add such a column
import sqlite3
con = sqlite3.connect( ":memory:" )
cur = con.cursor()
cur.execute("create table test( pkey integer primary key autoincrement not null, id varchar(20) UNIQUE not null, value int, value2 int )")
cur.executemany("insert into test (id, value, value2) values (:id, :value, :value2)",
[ {'id': 'A', 'value' : 1, 'value2' : 8 }, {'id': 'B', 'value' : 3, 'value2' : 10 } ] )
cur.execute('select * from test')
print( cur.fetchall())
con.commit()
cur = con.cursor()
# upsert using insert or replace.
# when id is found it should modify value but ignore value2
# when id is not found it will enter a record with value and value2
upsert = '''
insert or replace into test (id, value, value2)
select d.id, d.value, ifnull(t.value2, d.value2) from ( select :id as id, :value as value, :value2 as value2 ) d
left join test t on d.id = t.id
'''
upsert_data = [ { 'id' : 'B', 'value' : 4, 'value2' : 5 },
{ 'id' : 'C', 'value' : 3, 'value2' : 12 } ]
cur.executemany( upsert, upsert_data )
cur.execute('select * from test')
print( cur.fetchall())
output is now:
$python3 main.py
[(1, 'A', 1, 8), (2, 'B', 3, 10)]
[(1, 'A', 1, 8), (3, 'B', 4, 10), (4, 'C', 3, 12)]
Note pkey 2 is replaced with 3 for id 'B'
This is therefore not ideal but is a good solution when:
You don't have an auto-generated primary key
You want to create an 'upsert' query with bound parameters
You want to use executemany() to merge in multiple rows of data in one go.

Related

Select a large volume of data with like SQL server

I have a table with ID column
ID column is like this : IDxxxxyyy
x will be 0 to 9
I have to select row with ID like ID0xxx% to ID3xxx%, there will be around 4000 ID with % wildcard from ID0000% to ID3999%.
It is like combining LIKE with IN
Select * from TABLE where ID in (ID0000%,ID0001%,...,ID3999%)
I cannot figure out how to select with this condition.
If you have any idea, please help.
Thank you so much!
You can use pattern matching with LIKE. e.g.
WHERE ID LIKE 'ID[0-3][0-9][0-9][0-9]%'
Will match an string that:
Starts with ID (ID)
Then has a third character that is a number between 0 and 3 [0-3]
Then has 3 further numbers ([0-9][0-9][0-9])
This is not likely to perform well at all. If it is not too late to alter your table design, I would separate out the components of your Identifier and store them separately, then use a computed column to store your full id e.g.
CREATE TABLE T
(
NumericID INT NOT NULL,
YYY CHAR(3) NOT NULL, -- Or whatever type makes up yyy in your ID
FullID AS CONCAT('ID', FORMAT(NumericID, '0000'), YYY),
CONSTRAINT PK_T__NumericID_YYY PRIMARY KEY (NumericID, YYY)
);
Then your query is a simple as:
SELECT FullID
FROM T
WHERE NumericID >= 0
AND NumericID < 4000;
This is significantly easier to read and write, and will be significantly faster too.
This should do that, it will get all the IDs that start with IDx, with x that goes form 0 to 4
Select * from TABLE where ID LIKE 'ID[0-4]%'
You can try :
Select * from TABLE where id like 'ID[0-3][0-9]%[a-zA-Z]';

Count rows that follows other rows in a single table both restricted with a where clause

I'm using SQL Server 2014.
I have a table that contains several millions of events. The primary key is composed of three columns:
Time DateTime
user (bigint)
context (varchar(50))
I have another column with a value (nvarchar(max))
I need to count rows restricted on
context = 'somecontext' and value = 'value2'
that follows in time rows restricted on
context = 'somecontext' and value = 'value1'
for the same user.
For Example with the following records:
Time user context value
2019-02-22 14:56:57.710 359586015014836 somecontext value1
2019-02-22 15:13:42.887 359586015014836 somecontext value2 <------ Need to count this rows only.
It is "recorded" 15 min after the first one and the user and context are the same.
I have seen other similar questions like this one or that one.
Should I make a JOIN on the same table? Use subqueries? may be a CTE? I'm concerned about performance that should be optimal.
The idea would be to use query features available in this version of the DB engine.
If the example that I made in comment is what you want than you can use the following code
assuming that you want to select all the rows where context = 'c1', current value = 'v1', next value = 'v3' if ordered by time:
declare #t table
(
Time_ DateTime,
user_ bigint,
context varchar(50),
value_ varchar(50)
);
insert into #t values
('20000101', 1, 'c1', 'v1'),
('20000102', 1, 'c2', 'v3'),
('20000103', 1, 'c1', 'v3'),
('20000104', 2, 'c1', 'v1'),
('20000105', 2, 'c1', 'v4'),
('20000106', 2, 'c1', 'v2');
with cte as
(
select *,
lead(value_) over(partition by user_ order by time_) as next_value
from #t
where context = 'c1'
)
select *
from cte
where next_value = 'v3';

How to update table based on a json map?

If I have this table
CREATE TABLE tmp (
a integer,
b integer,
c text
);
INSERT INTO tmp (a, b, c) VALUES (1, 2, 'foo');
And this json:
{
"a": 4,
"c": "bar"
}
Where the keys map to the column names, and the values are the new values.
How can I update the tmp table without touching columns that aren't in the map?
I thought about constructing a dynamic string of SQL update statement that can be executed in pl/pgsql, but it seems the number of arguments that get passed to USING must be predetermined. But the actual number of arguments is determined by the number of keys in the map, which is dynamic, so this seems like a dead end.
I know I can update the table using multiple update statements as I loop over the keys, but the problem is that I have a trigger set up for the table that will revision the table (by inserting changed columns into another table), so the columns must be updated in a single update statement.
I wonder if it's possible to dynamically update a table with a json map?
Use coalesce(). Example table:
drop table if exists my_table;
create table my_table(id int primary key, a int, b text, c date);
insert into my_table values (1, 1, 'old text', '2017-01-01');
and query:
with jsondata(jdata) as (
values ('{"id": 1, "b": "new text"}'::jsonb)
)
update my_table set
a = coalesce((jdata->>'a')::int, a),
b = coalesce((jdata->>'b')::text, b),
c = coalesce((jdata->>'c')::date, c)
from jsondata
where id = (jdata->>'id')::int;
select * from my_table;
id | a | b | c
----+---+----------+------------
1 | 1 | new text | 2017-01-01
(1 row)

SQL Update NULL value from select statement query

I'm new to posting on this site, but been using it for a while to get assistance to SQL queries.
I have an issue that I'm trying to resolve. I have 2 columns in a query which are machine and ID, for some machines the ID will be NULL, but for others they will have an ID value as set out below.
Machine ID
test1 3
test12 NULL
test3 4
test4 NULL
As the ID's will be present in the table, I need to update the NULL values, if the machine name is like the one which has a value, for example test 1 and test12 both should have ID 3, but test12 is showing NULL. What I want to be able to do is to replace the NULL for test12 with ID = 3, as the machine names are similar.
I have tried COALESCE, ISNULL and CASE, which all will update the values, but I need know the value, but I wont know it until I have done the select statement.
Any ideas on how to resolve this please?
As noted in the comments you will have to work out your match formula and specify it in the join condition. I believe the query you are looking for is:
Create Table Machines (Name Varchar(8000), ID Int)
Insert Into Machines Values ('test1', 3)
Insert Into Machines Values ('test12', Null)
Insert Into Machines Values ('test3', 4)
Insert Into Machines Values ('test4', Null)
Insert Into Machines Values ('test89', Null)
Insert Into Machines Values ('test8', 5)
Insert Into Machines Values ('test64', Null)
Update M1 Set M1.ID = M2.ID
From Machines M1
Join (Select Left(Name,5) NamePrefix, Max(ID) ID
From Machines
Where ID Is Not Null
Group By Left(Name,5)) M2
On Left(M1.Name, 5) = M2.NamePrefix
Where M1.ID Is Null
Select * From Machines
Note that I used a group by in the joined query in case multiple rows match and we only want one value returned. You can use window functions or other logic instead of the group by if you want to pick specifically which match is chosen.
Based on the requirements in your last comment:
There will be a number of records in the table, they are grouped by the letter before and after the '-' i.e. AB-CDE-L111, AB-CDE-L112, AB-CDE-L113, AB-CDE-L124, AB-CDE-L116 all of these should have in the query an ID of 45. The next set of machines will be AB-CCC-L111, AB-CCC-L112, AB-CCC-L115 all of these should have in the query an ID of 47 and finally there will be the last set of machine, AB-BBB-L113, AB-BBB-L144, AB-BBB-L115, AB-BBB-L120 all of these should have in the query an ID of 50. In the query, a machine returns a NULL ID then I need to update the query results, not the table.
So a SELECT query to get you your results would be:
declare #machine table (Machine varchar(30) not null, ID int null)
insert into #machine
values ('AB-CDE-L111', NULL),
('AB-CDE-L112', NULL),
('AB-CDE-L113', 45),
('AB-CDE-L124', NULL),
('AB-CDE-L116', NULL),
('AB-CCC-L111', NULL),
('AB-CCC-L112', NULL),
('AB-CCC-L113', 47),
('AB-CCC-L124', NULL),
('AB-CCC-L116', NULL),
('AB-BBB-L111', NULL),
('AB-BBB-L112', NULL),
('AB-BBB-L113', 50),
('AB-BBB-L124', NULL),
('AB-BBB-L116', NULL)
select m1.Machine, m2.ID
from #machine m1
inner join #machine m2
on m2.ID is not null
and left(m2.Machine, 6) = left(m1.Machine, 6)
order by m1.Machine
This assumes that:
1) There is always the same amount of characters making up the prefix to the machine code.
2) That there is only one Machine in each group that has been assigned an ID.
If either of these assumptions is wrong then you may need to do extra string manipulation in the case of 1) and use some kind of function (ROW_NUMBER etc.) in the case of 2) to avoid duplicate rows (although you could just use SELECT DISTINCT if the IDs would be the same).

conditional "next value for sequence"

scenario:
Sql Server 2012 Table named "Test" has two fields. "CounterNo" and "Value" both integers.
There are 4 sequence objects defined named sq1, sq2, sq3, sq4
I want to do these on inserts:
if CounterNo = 1 then Value = next value for sq1
if CounterNo = 2 then Value = next value for sq2
if CounterNo = 3 then Value = next value for sq3
I think, create a custom function assign it as default value of Value field. But when i tried custom functions not supports "next value for Sequence Objects"
Another way is using trigger. That table has trigger already.
Using a Stored Procedure for Inserts is the best way. But EntityFramework 5 Code-First is not supporting it.
Can you suggest me a way to achieve this.
(if you show me how can i do it with custom functions you can also post it here. It's another question of me.)
Update:
In reality there are 23 fields in that table and also primary keys setted and i'm generating this counter value on software side, using "counter table".It is not good to generate counter values on client side.
I'm using 4 sequence objects as counters because they represents different types of records.
If i use 4 counters on same record at same time, all of them generates next values. I want only related counter generates it's next value while others remains same.
I'm not shure if I fully understand your use case but maybe the following sample illustrates what you need.
Create Table Vouchers (
Id uniqueidentifier Not Null Default NewId()
, Discriminator varchar(100) Not Null
, VoucherNumber int Null
-- ...
, MoreData nvarchar(100) Null
);
go
Create Sequence InvoiceSequence AS int Start With 1 Increment By 1;
Create Sequence OrderSequence AS int Start With 1 Increment By 1;
go
Create Trigger TR_Voucher_Insert_VoucherNumer On Vouchers After Insert As
If Exists (Select 1 From inserted Where Discriminator = 'Invoice')
Update v
Set VoucherNumber = Next Value For InvoiceSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Invoice';
If Exists (Select 1 From inserted Where Discriminator = 'Order')
Update v
Set VoucherNumber = Next Value For OrderSequence
From Vouchers v Inner Join inserted i On (v.Id = i.Id)
Where i.Discriminator = 'Order';
go
Insert Into Vouchers (Discriminator, MoreData)
Values ('Invoice', 'Much')
, ('Invoice', 'More')
, ('Order', 'Data')
, ('Invoice', 'And')
, ('Order', 'Again')
;
go
Select * From Vouchers;
Now Invoice- and Order-Numbers will be incremented independently. And as you can have multiple insert triggers on the same table, that shouldn't be an issue.
I think you're thinking about this in the wrong way. You have 3 values and these values are determined by another column. Switch it around, create 3 columns and remove the Counter column.
If you have a table with value1, value2 and value3 then the Counter value is implied by the column in which the value resides. Create a unique index on these three columns and add an identity column for a primary key and you're sorted; you can do it all in a stored procedure easily.
If you have four different types of records, use four different tables, with a separate identity column in each one.
If you need to see all the data together, then use a view to combine them:
create v_AllTypes as
select * from type1 union all
select * from type2 union all
select * from type3 union all
select * from type4;
Alternatively, do the calculation of the sequence number on output:
select t.*,
row_number() over (partition by CounterNo order by t.id) as TypeSeqNum
from AllTypes t;
Something seems amiss with your data model if it requires conditional updates to four identity columns.

Resources