Conditional INSERT INTO statement in postgres - database

I'm writing a booking procedure for a mock airline booking database and what I really want to do is something like this:
IF EXISTS (SELECT * FROM LeadCustomer
WHERE FirstName = 'John' AND Surname = 'Smith')
THEN
INSERT INTO LeadCustomer (Firstname, Surname, BillingAddress, email)
VALUES ('John', 'Smith', '6 Brewery close,
Buxton, Norfolk', 'cmp.testing#example.com');
But Postgres doesn't support IF statements without loading the PL/pgSQL extension. I was wondering if there was a way to do some equivalent of this or if there's just going to have to be some user interaction in this step?

That specific command can be done like this:
insert into LeadCustomer (Firstname, Surname, BillingAddress, email)
select
'John', 'Smith',
'6 Brewery close, Buxton, Norfolk', 'cmp.testing#example.com'
where not exists (
select 1 from leadcustomer where firstname = 'John' and surname = 'Smith'
);
It will insert the result of the select statement, and the select will only return a row if that customer does not exist.

As of 9.5 version of pgsql upsert is included, using INSERT ... ON CONFLICT DO UPDATE ...
The answer below is no longer relevant. Postgres 9.5 was released a couple years later with a better solution.
Postgres doesn't have "upsert" functionality without adding new functions.
What you'll have to do is run the select query and see if you have matching rows. If you do, then insert it.
I know you're not wanting an upsert exactly, but it's pretty much the same.

-- Use follwing format to insert data in any table like this --
create table user
(
user_id varchar(25) primary key,
phone_num numeric(15),
failed_login int not null default 0,
Login time timestamp
);
INSERT INTO USER(user_id, phone_num, failed_login, Login time)
VALUES ('12345','123456789','3',' 2021-01-16 04:24:01.755');

Related

Snowflake - how to do multiple DML operations on same primary key in a specific order?

I am trying to set up continuous data replication in Snowflake. I get the transactions happened in source system and I need to perform them in Snowflake in the same order as source system. I am trying to use MERGE for this, but when there are multiple operations on same key in source system, MERGE is not working correctly. It either misses an operation or returns duplicate row detected during DML operation error.
Please note that the transactions need to happen in exact order and it is not possible to take the latest transaction for a key and just do it (like if a record has been INSERTED and UPDATED, in Snowflake too it needs to be inserted first and then updated even though insert is only transient state) .
Here is the example:
create or replace table employee_source (
id int,
first_name varchar(255),
last_name varchar(255),
operation_name varchar(255),
binlogkey integer
)
create or replace table employee_destination ( id int, first_name varchar(255), last_name varchar(255) );
insert into employee_source values (1,'Wayne','Bells','INSERT',11);
insert into employee_source values (1,'Wayne','BellsT','UPDATE',12);
insert into employee_source values (2,'Anthony','Allen','INSERT',13);
insert into employee_source values (3,'Eric','Henderson','INSERT',14);
insert into employee_source values (4,'Jimmy','Smith','INSERT',15);
insert into employee_source values (1,'Wayne','Bellsa','UPDATE',16);
insert into employee_source values (1,'Wayner','Bellsat','UPDATE',17);
insert into employee_source values (2,'Anthony','Allen','DELETE',18);
MERGE into employee_destination as T using (select * from employee_source order by binlogkey)
AS S
ON T.id = s.id
when not matched
And S.operation_name = 'INSERT' THEN
INSERT (id,
first_name,
last_name)
VALUES (
S.id,
S.first_name,
S.last_name)
when matched AND S.operation_name = 'UPDATE'
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.operation_name = 'DELETE' THEN DELETE;
I am expecting to see - Bellsat - as last name for employee id 1 in the employee_destination table after all rows get processed. Same way, I should not see emp id 2 in the employee_destination table.
Is there any other alternative to MERGE to achieve this? Basically to go over every single DML in the same order (using binlogkey column for ordering) .
thanks.
You need to manipulate your source data to ensure that you only have one record per key/operation otherwise the join will be non-deterministic and will (dpending on your settings) either error or will update using a random one of the applicable source records. This is covered in the documentation here https://docs.snowflake.com/en/sql-reference/sql/merge.html#duplicate-join-behavior.
In any case, why would you want to update a record only for it to be overwritten by another update - this would be incredibly inefficient?
Since your updates appear to include the new values for all rows, you can use a window function to get to just the latest incoming change, and then merge those results into the target table. For example, the select for that merge (with the window function to get only the latest change) would look like this:
with SOURCE_DATA as
(
select COLUMN1::int ID
,COLUMN2::string FIRST_NAME
,COLUMN3::string LAST_NAME
,COLUMN4::string OPERATION_NAME
,COLUMN5::int PROCESSING_ORDER
from values
(1,'Wayne','Bells','INSERT',11),
(1,'Wayne','BellsT','UPDATE',12),
(2,'Anthony','Allen','INSERT',13),
(3,'Eric','Henderson','INSERT',14),
(4,'Jimmy','Smith','INSERT',15),
(1,'Wayne','Bellsa','UPDATE',16),
(1,'Wayne','Bellsat','UPDATE',17),
(2,'Anthony','Allen','DELETE',18)
)
select * from SOURCE_DATA
qualify row_number() over (partition by ID order by PROCESSING_ORDER desc) = 1
That will produce a result set that has only the changes required to merge into the target table:
ID
FIRST_NAME
LAST_NAME
OPERATION_NAME
PROCESSING_ORDER
1
Wayne
Bellsat
UPDATE
17
2
Anthony
Allen
DELETE
18
3
Eric
Henderson
INSERT
14
4
Jimmy
Smith
INSERT
15
You can then change the when not matched to remove the operation_name. If it's listed as an update and it's not in the target table, it's because it was inserted in a previous operation in the new changes.
For the when matched clause, you can use the operation_name to determine if the row should be updated or deleted.

How to select the default values of a table?

In my app, when letting the user enter a new record, I want to preselect the database's default values.
Let's for example take this table:
CREATE TABLE pet (
ID INT NOT NULL,
name VARCHAR(255) DEFAULT 'noname',
age INT DEFAULT 1
)
I would like to do something like this:
SELECT DEFAULT VALUES FROM pet -- NOT WORKING
And it should return:
ID | name | age
--------------------
NULL | noname | 1
I would then let the user fill in the remaining fields, or let her change one of the defaults, before she clicks on "save".
How can I select the default values of a sql server table using tsql?
You don't "SELECT" the Default values, only insert them. A SELECT returns the rows from a table, you can't SELECT the DEFAULT VALUES as there's no such row inside the table.
You could do something silly, like use a TRANSACTION and roll it back, but as ID doesn't have a default value, and you don't define a value for it with DEFAULT VALUES, it'll fail in your scenario:
CREATE TABLE pet (
ID INT NOT NULL,
name VARCHAR(255) DEFAULT 'noname',
age INT DEFAULT 1
)
GO
BEGIN TRANSACTION;
INSERT INTO dbo.pet
OUTPUT inserted.*
DEFAULT VALUES;
ROLLBACK;
Msg 515, Level 16, State 2, Line 13
Cannot insert the value NULL into column 'ID', table 'Sandbox.dbo.pet'; column does not allow nulls. INSERT fails.
You can, therefore, just supply the values for your non-NULL columns:
BEGIN TRANSACTION;
INSERT INTO dbo.pet (ID)
OUTPUT inserted.*
VALUES(1);
ROLLBACK;
Which will output the "default" values:
ID|name |age
--|------|---
1|noname|1
Selecting the default values of all columns is not very straight-forward, and as Heinzi wrote in his comment - does require a level of permissions you normally don't want your users to have.
That being said, a simple workaround would be to insert a record, select it back and display to the user, let the user decide what they want to change (if anything) and then when they submit the record - update the record (or delete the previous record and insert a new one).
That would require you to have some indication if the record was actually reviewed and updated by the user, but that's easy enough to accomplish by simply adding a bit column and setting it to 1 when updating the data.
As I have commented before. There is no need for this query since you can press alt + f1 on any table in your editor in Management Studio and provide you every information you need for the table.
select sys1.name 'Name',replace(replace(
case
when object_definition(sys1.default_object_id) is null then 'No Default Value'
else object_definition(sys1.default_object_id)
end ,'(',''),')','') 'Default value',
information_schema.columns.data_type 'Data type'
from sys.columns as sys1
left join information_schema.columns on sys1.name = information_schema.columns.column_name
where
object_id = object_id('table_name')
and information_schema.columns.table_name = 'table_name'
It seems like this might be solution:
SELECT * FROM (
SELECT
sys1.name AS COLUMN_NAME,
replace(replace(object_definition(sys1.default_object_id),'(',''),')','') AS DEFAULT_VALUE
FROM sys.columns AS sys1
LEFT JOIN information_schema.columns ON sys1.name = information_schema.columns.column_name
WHERE object_id = object_id('pet')
AND information_schema.columns.table_name = 'pet'
) AS SourceTable PIVOT(MAX(DEFAULT_VALUE) FOR COLUMN_NAME IN(ID, name, age)) AS PivotTable;
It returns:
ID |name |age
----|------|---
NULL|noname|1
Probably the column types are incorrect - but maybe I can live with that.
Thanks for #Nissus to provide an intermediate step to this.

Removing Duplicates with SQL Express 2017

I have a table of 120 million rows. About 8 million of those rows are duplicates depending on what value/column I use to determine duplicates. For argument sake, I'm testing out the email column vs multiple columns to see what happens with my data.
The file is about 10GB, so I cannot simply add another table to the database because of the size limits of SQL Express. Instead, I thought I'd try to extract, truncate, insert using a temp table since I've been meaning to try that method out.
I know I can use CTE to remove the duplicates, but every single time I try to do that it takes forever and my system locks up. My solution is to do the following.
1.Extract all rows to tempdb
2.Sort by Min(id)
3.Truncate original table
4.Transfer new unique data from tempdb back to main table
5.Take the extra duplicates and trim to uniques using Delimit
6.Import the leftover rows back into the database.
My table looks like the following.
Name Gender Age Email ID
Jolly Female 28 jolly#jolly.com 1
Jolly Female 28 jolly#jolly.com 2
Jolly Female 28 jolly#jolly.com 3
Kate Female 36 kate#kate.com 4
Kate Female 36 kate#kate.com 5
Kate Female 36 kate#kate.com 6
Jack Male 46 jack#jack.com 7
Jack Male 46 jack#jack.com 8
Jack Male 46 jack#jack.com 9
My code
SET IDENTITY_INSERT test.dbo.contacts ON
GO
select name, gender, age, email, id into ##contacts
from test.dbo.contacts
WHERE id IN
(SELECT MIN(id) FROM test.dbo.contacts GROUP BY name)
TRUNCATE TABLE test.dbo.contacts
INSERT INTO test.dbo.contacts
SELECT name, gender, age, total_score, id
from ##students
SET IDENTITY_INSERT test.dbo.contactsOFF
GO
This code is almost working, except for the following error that I see.
"An explicit value for the identity column in table 'test.dbo.contacts' can only be specified when a column list is used and IDENTITY_INSERT is ON.
I have absolutely no idea why I keep seeing that message since I turned identity_insert on and off.
Can somebody please tell me what I'm missing in the code? And if anybody has another solution to keep unique rows I'd love to hear about it.
You said that your original problem was that " it takes forever and my system locks up".
The problem is the amount of time necessary for the operation and the lock escalation to table lock.
My suggestion is to break down the operation so that you delete less than 5000 rows at time.
I assume you have less than 5000 duplicates for each name.
You can read more about lock escalation here:
https://www.sqlpassion.at/archive/2014/02/25/lock-escalations/
About your problem (identity insert), your script contains at least two errors so I guess it's not the original one, so it hard to say why the original one fails.
use test;
if object_ID('dbo.contacts') is not null drop table dbo.contacts;
CREATE TABLE dbo.contacts
(
id int identity(1,1) primary key clustered,
name nvarchar(50),
gender varchar(15),
age tinyint,
email nvarchar(50),
TS Timestamp
)
INSERT INTO [dbo].[contacts]([name],[gender],[age],[email])
VALUES
('Jolly','Female',28,'jolly#jolly.com'),
('Jolly','Female',28,'jolly#jolly.com'),
('Jolly','Female',28,'jolly#jolly.com'),
('Kate','Female',36,'kate#kate.com'),
('Kate','Female',36,'kate#kate.com'),
('Kate','Female',36,'kate#kate.com'),
('Jack','Male',46,'jack#jack.com'),
('Jack','Male',46,'jack#jack.com'),
('Jack','Male',46,'jack#jack.com');
--for the purpose of the lock escalation, I assume you have less then 5.000 duplicates for each single name.
if object_ID('tempdb..#KillList') is not null drop table #KillList;
SELECT KL.*, C.TS
into #KillList
from
(
SELECT [name], min(ID) GoodID
from dbo.contacts
group by name
having count(*) > 1
) KL inner join
dbo.contacts C
ON KL.GoodID = C.id
--This has the purpose of testing concurrent updates on relevant rows
--UPDATE [dbo].[contacts] SET Age = 47 where ID=7;
--DELETE [dbo].[contacts] where ID=7;
while EXISTS (SELECT top 1 1 from #KillList)
BEGIN
DECLARE #id int;
DECLARE #name nvarchar(50);
DECLARE #TS binary(8);
SELECT top 1 #id=GoodID, #name=Name, #TS=TS from #KillList;
BEGIN TRAN
if exists (SELECT * from [dbo].[contacts] where id=#id and TS=#TS)
BEGIN
DELETE FROM C
from [dbo].[contacts] C
where id <> #id and Name = #name;
DELETE FROM #KillList where Name = #name;
END
ELSE
BEGIN
ROLLBACK TRAN;
RAISERROR('Concurrency error while deleting %s', 16, 1, #name);
RETURN;
END
commit TRAN;
END
SELECT * from [dbo].[contacts];
I wrote it this way, that you can see the sub results of each query.
The inner sql should not have *, instead use id.
delete from [contacts] where id in
(
select id from
(
select *, ROW_NUMBER() over (partition by name, gender, age, email order by id) as rowid from [contacts]
) rowstobedeleted where rowid>1
)
If this takes too long/makes much load, you can use SET ROWCOUNT to provide smaller chunks, but then you need to run it until nothing is delete anymore.
I think that you need something like this:
INSERT INTO test.dbo.contacts (idcol1,col2)
VALUES (value1,value2)

TSQL Update Issue

Ok SQL Server fans I have an issue with a legacy stored procedure that sits inside of a SQL Server 2008 R2 Instance that I have inherited also with the PROD data which to say the least is horrible. Also, I can NOT make any changes to the data nor the table structures.
So here is my problem, the stored procedure in question runs daily and is used to update the employee table. As you can see from my example the incoming data (#New_Employees) contains the updated data and I need to use it to update the data in the Employee data is stored in the #Existing_Employees table. Throughout the years different formatting of the EMP_ID value has been used and must be maintained as is (I fought and lost that battle). Thankfully, I have been successfully in changing the format of the EMP_ID column in the #New_Employees table (Yeah!) and any new records will use this format thankfully!
So now you may see my problem, I need to update the ID column in the #New_Employees table with the corresponding ID from the #Existing_Employees table by matching (that's right you guessed it) by the EMP_ID columns. So I came up with an extremely hacky way to handle the disparate formats of the EMP_ID columns but it is very slow considering the number of rows that I need to process (1M+).
I thought of creating a staging table where I could simply cast the EMP_ID columns to an INT and then back to a NVARCHAR in each table to remove the leading zeros and I am sort of leaning that way but I wanted to see if there was another way to handle this dysfunctional data. Any constructive comments are welcome.
IF OBJECT_ID(N'TempDB..#NEW_EMPLOYEES') IS NOT NULL
DROP TABLE #NEW_EMPLOYEES
CREATE TABLE #NEW_EMPLOYEES(
ID INT
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
IF OBJECT_ID(N'TempDB..#EXISTING_EMPLOYEES') IS NOT NULL
DROP TABLE #EXISTING_EMPLOYEES
CREATE TABLE #EXISTING_EMPLOYEES(
ID INT PRIMARY KEY
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
INSERT INTO #NEW_EMPLOYEES
VALUES(NULL, '00123', 'Adam Arkin')
,(NULL, '00345', 'Bob Baker')
,(NULL, '00526', 'Charles Nelson O''Reilly')
,(NULL, '04321', 'David Numberman')
,(NULL, '44321', 'Ida Falcone')
INSERT INTO #EXISTING_EMPLOYEES
VALUES(1, '123', 'Adam Arkin')
,(2, '000345', 'Bob Baker')
,(3, '526', 'Charles Nelson O''Reilly')
,(4, '0004321', 'Ed Sullivan')
,(5, '02143', 'Frank Sinatra')
,(6, '5567', 'George Thorogood')
,(7, '0000123-1', 'Adam Arkin')
,(8, '7', 'Harry Hamilton')
-- First Method - Not Successful
UPDATE NE
SET ID = EE.ID
FROM
#NEW_EMPLOYEES NE
LEFT OUTER JOIN #EXISTING_EMPLOYEES EE
ON EE.EMP_ID = NE.EMP_ID
SELECT * FROM #NEW_EMPLOYEES
-- Second Method - Successful but Slow
UPDATE NE
SET ID = EE.ID
FROM
dbo.#NEW_EMPLOYEES NE
LEFT OUTER JOIN dbo.#EXISTING_EMPLOYEES EE
ON CAST(CASE WHEN NE.EMP_ID LIKE N'%[^0-9]%'
THEN NE.EMP_ID
ELSE LTRIM(STR(CAST(NE.EMP_ID AS INT))) END AS NVARCHAR(50)) =
CAST(CASE WHEN EE.EMP_ID LIKE N'%[^0-9]%'
THEN EE.EMP_ID
ELSE LTRIM(STR(CAST(EE.EMP_ID AS INT))) END AS NVARCHAR(50))
SELECT * FROM #NEW_EMPLOYEES
the number of rows that I need to process (1M+).
A million employees? Per day?
I think I would add a 3rd table:
create table #ids ( id INT not NULL PRIMARY KEY
, emp_id not NULL NVARCHAR(50) unique );
Populate that table using your LTRIM(STR(CAST, ahem, algorithm, and update Employees directly from a join of those three tables.
I recommend using ANSI update, not Microsoft's nonstandard update ... from because the ANSI version prevents nondeterministic results in cases where the FROM produces more than one row.

T-SQL for Updating Rows with same value in a column

I have a table lets say called FavoriteFruits that has NAME, FRUIT, and GUID for columns. The table is already populated with names and fruits. So lets say:
NAME FRUIT GUID
John Apple NULL
John Orange NULL
John Grapes NULL
Peter Canteloupe NULL
Peter Grapefruit NULL
Ok, now I want to update the GUID column with a new GUID (using NEWID()), but I want to have the same GUID per distinct name. So I want all the John Smiths to have the same GUID, and I want both the Peters to have the same GUID, but that GUID different than the one used for the Johns. So now it would look something like this:
NAME FRUIT GUID
John Apple f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
John Orange f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
John Grapes f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
Peter Canteloupe e3b1851c-1927-491a-803e-6b3bce9bf223
Peter Grapefruit e3b1851c-1927-491a-803e-6b3bce9bf223
Can I do that in an update statement without having to use a cursor? If so can you please give an example?
Thanks guys...
Update a CTE won't work because it'll evaluate per row. A table variable would work:
You should be able to use a table variable as a source from which to update the data. This is untested, but it'll look something like:
DECLARE #n TABLE (Name varchar(10), Guid uniqueidentifier);
INSERT #n
SELECT Name, newid() AS Guid
FROM FavoriteFruits
GROUP BY Name;
UPDATE f
SET f.Guid = n.Guid
FROM #n n
JOIN FavoriteFruits f ON f.Name = n.Name
So that populates a variable with a GUID per name, then joins it back to the original table and updates accordingly.
To clarify comments re a table expression in the USING clause of a MERGE statement.
The following won't work because it'll evaluate per row:
MERGE INTO FavoriteFruits
USING (
SELECT NAME, NEWID() AS GUID
FROM FavoriteFruits
GROUP
BY NAME
) AS source
ON source.NAME = FavoriteFruits.NAME
WHEN MATCHED THEN
UPDATE
SET GUID = source.GUID;
But the following, using a table variable, will work:
DECLARE #n TABLE
(
NAME VARCHAR(10) NOT NULL UNIQUE,
GUID UNIQUEIDENTIFIER NOT NULL UNIQUE
);
INSERT INTO #n (NAME, GUID)
SELECT NAME, NEWID()
FROM FavoriteFruits
GROUP
BY NAME;
MERGE INTO FavoriteFruits
USING #n AS source
ON source.NAME = FavoriteFruits.NAME
WHEN MATCHED THEN
UPDATE
SET GUID = source.GUID;
There's a single-statement solution too, which, however, has some limitations. The idea is to use OPENQUERY(), like this:
UPDATE FavoriteFruits
SET GUID = n.GUID
FROM (
SELECT NAME, GUID
FROM OPENQUERY(
linkedserver,
'SELECT NAME, NEWID() AS GUID FROM database.schema.FavoriteFruits GROUP BY NAME'
)
) n
WHERE FavoriteFruits.NAME = n.NAME
This solution implies that you need to create a self-pointing linked server. Another specificity is that you can't use this method on table variables nor local temporary tables (global ones would do as well as 'normal' tables).

Resources