ON CONFLICT DO UPDATE/DO NOTHING not working on FOREIGN TABLE - database

ON CONFLICT DO UPDATE/DO NOTHING feature is coming in PostgreSQL 9.5.
Creating Server and FOREIGN TABLE is coming in PostgreSQL 9.2 version.
When I'm using ON CONFLICT DO UPDATE for FOREIGN table it is not working,
but when i'm running same query on normal table it is working.Query is given below.
// For normal table
INSERT INTO app
(app_id,app_name,app_date)
SELECT
p.app_id,
p.app_name,
p.app_date FROM app p
WHERE p.app_id=2422
ON CONFLICT (app_id) DO
UPDATE SET app_date = excluded.app_date ;
O/P : Query returned successfully: one row affected, 5 msec execution time.
// For foreign table concept
// foreign_app is foreign table and app is normal table
INSERT INTO foreign_app
(app_id,app_name,app_date)
SELECT
p.app_id,
p.app_name,
p.app_date FROM app p
WHERE p.app_id=2422
ON CONFLICT (app_id) DO
UPDATE SET app_date = excluded.app_date ;
O/P : ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
Can any one explain why is this happening ?

There are no constraints on foreign tables, because PostgreSQL cannot enforce data integrity on the foreign server – that is done by constraints defined on the foreign server.
To achieve what you want to do, you'll have to stick with the “traditional” way of doing this (e.g. this code sample).

I know this is an old question, but in some cases there is a way to do it with ROW_NUMBER OVER (PARTION). In my case, my first take was to try ON CONFLICT...DO UPDATE, but that doesn't work on foreign tables (as stated above; hence my finding this question). My problem was very specific, in that I had a foreign table (f_zips) to be populated with the best zip code (postal code) information possible. I also had a local table, postcodes, with very good data and another local table, zips, with lower-quality zip code information but much more of it. For every record in postcodes, there is a corresponding record in zips but the postal codes may not match. I wanted f_zips to hold the best data.
I solved this with a union, with a value of ind = 0 as the indicator that a record came from the better data set. A value of ind = 1 indicates lesser-quality data. Then I used row_number() over a partion to get the answer (where get_valid_zip5() is a local function to return either a five-digit zip code or a null value):
insert into f_zips (recnum, postcode)
select s2.recnum, s2.zip5 from (
select s1.recnum, s1.zip5, s1.ind, row_number()
over (partition by recnum order by s1.ind) as rn from (
select recnum, get_valid_zip5(postcode) as zip5, 0 as ind
from postcodes
where get_valid_zip5(postcode) is not null
union
select recnum, get_valid_zip5(zip9) as zip5, 1 as ind
from zips
where get_valid_zip5(zip9) is not null
order by 1, 3) s1
) s2 where s2.rn = 1
;
I haven't run any performance tests, but for me this runs in cron and doesn't directly affect the users.
Verified on more than 900,000 records (SQL formatting omitted for brevity) :
/* yes, the preferred data was entered when it existed in both tables */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 join zips t2 on t1.recnum = t2.recnum where t1.postcode is not null and t2.zip9 is not null and t2.zip9 not in ('0') and length(t1.postcode)=5 and length(t2.zip9)=5 and t1.postcode <> t2.zip9 order by 1 limit 5;
recnum | postcode | zip9
----------+----------+-------
12022783 | 98409 | 98984
12022965 | 98226 | 98225
12023113 | 98023 | 98003
select * from f_zips where recnum in (12022783, 12022965, 12023113) order by 1;
recnum | postcode
----------+----------
12022783 | 98409
12022965 | 98226
12023113 | 98023
/* yes, entries came from the less-preferred dataset when they didn't exist in the better one */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 right join zips t2 on t1.recnum = t2.recnum where t1.postcode is null and t2.zip9 is not null and t2.zip9 not in ('0') and length(t2.zip9)= 5 order by 1 limit 3;
recnum | postcode | zip9
----------+----------+-------
12021451 | | 98370
12022341 | | 98501
12022695 | | 98597
select * from f_zips where recnum in (12021451, 12022341, 12022695) order by 1;
recnum | postcode
----------+----------
12021451 | 98370
12022341 | 98501
12022695 | 98597
/* yes, entries came from the preferred dataset when the less-preferred one had invalid values */
select t1.recnum, t1.postcode, t2.zip9 from postcodes t1 left join zips t2 on t1.recnum = t2.recnum where t1.postcode is not null and t2.zip9 is null order by 1 limit 3;
recnum | postcode | zip9
----------+----------+------
12393585 | 98118 |
12393757 | 98101 |
12393835 | 98101 |
select * from f_zips where recnum in (12393585, 12393757, 12393835) order by 1;
recnum | postcode
----------+----------
12393585 | 98118
12393757 | 98101
12393835 | 98101

Related

SQL Server: Performance issue: OR statement substitute in WHERE clause

I want to select only the records from table Stock based on the column PostingDate.
The PostingDate should be after the InitDate in another table called InitClient. However, there are currently 2 clients in both tables (client 1 and client 2), that both have a different InitDate.
With the code below I get exactly what I need currently, based on the sample data also included underneath. However, two problems arise, first of all based on millions of records the query is taking way too long (hours). And second of all, it isn't dynamic at all, every time when a new client is included.
A potential option to cover the performance issue would be to write two separate query's, one for Client 1 and one for Client 2 with a UNION in between. Unfortunately, this then isn't dynamic enough since multiple clients are possible.
SELECT
Material
,Stock
,Stock.PostingDate
,Stock.Client
FROM Stock
LEFT JOIN (SELECT InitDate FROM InitClient where Client = 1) C1 ON 1=1
LEFT JOIN (SELECT InitDate FROM InitClient where Client = 2) C2 ON 1=1
WHERE
(
(Stock.Client = 1 AND Stock.PostingDate > C1.InitDate) OR
(Stock.Client = 2 AND Stock.PostingDate > C2.InitDate)
)
Sample dataset:
CREATE TABLE InitClient
(
Client varchar(300),
InitDate date
);
INSERT INTO InitClient (Client,InitDate)
VALUES
('1', '5/1/2021'),
('2', '1/31/2021');
SELECT * FROM InitClient
CREATE TABLE Stock
(
Material varchar(300),
PostingDate varchar(300),
Stock varchar(300),
Client varchar(300)
);
INSERT INTO Stock (Material,PostingDate,Stock,Client)
VALUES
('322', '1/1/2021', '5', '1'),
('101', '2/1/2021', '5', '2'),
('322', '3/2/2021', '10', '1'),
('101', '4/13/2021', '5', '1'),
('400', '5/11/2021', '170', '2'),
('401', '6/20/2021', '200', '1'),
('322', '7/20/2021', '160', '2'),
('400', '8/9/2021', '93', '2');
SELECT * FROM Stock
Desired result, but then with a substitute for the OR statement to ramp up the performance:
| Material | PostingDate | Stock | Client |
|----------|-------------|-------|--------|
| 322 | 1/1/2021 | 5 | 1 |
| 101 | 2/1/2021 | 5 | 2 |
| 322 | 3/2/2021 | 10 | 1 |
| 101 | 4/13/2021 | 5 | 1 |
| 400 | 5/11/2021 | 170 | 2 |
| 401 | 6/20/2021 | 200 | 1 |
| 322 | 7/20/2021 | 160 | 2 |
| 400 | 8/9/2021 | 93 | 2 |
Any suggestions if there is an substitute possible in the above code to keep performance, while making it dynamic?
You can optimize this query quite a bit.
Firstly, those two LEFT JOINs are basically just semi-joins, because you don't actually return any results from them. So we can turn them into a single EXISTS.
You will also get an implicit conversion to int, because Client is varchar and 1,2 is an int. So change that to '1','2', or you could change the column type.
PostingDate is also varchar, that should really be date
SELECT
s.Material
,s.Stock
,s.PostingDate
,s.Client
FROM Stock s
WHERE s.Client IN ('1','2')
AND EXISTS (SELECT 1
FROM InitClient c
WHERE s.PostingDate > c.InitDate
AND c.Client = s.Client
);
Next you want to look at indexing. For this query (not accounting for any other queries being run), you probably want the following indexes (remove the INCLUDE for a clustered index)
InitClient (Client, InitDate)
Stock (Client) INCLUDE (PostingDate, Material, Stock)
It is possible that even with these indexes that you may get a scan on Stock, because IN functions like an OR. This does not always happen, it's worth checking. If so, instead you can rewrite this to use UNION ALL
SELECT
s.Material
,s.Stock
,s.PostingDate
,s.Client
FROM (
SELECT *
FROM Stock s
WHERE s.Client = '1'
UNION ALL
SELECT *
FROM Stock s
WHERE s.Client = '2'
) s
WHERE EXISTS (SELECT 1
FROM InitClient c
WHERE s.PostingDate > c.InitDate
AND c.Client = s.Client
);
db<>fiddle
There is nothing wrong in expecting your query to be dynamic. However, in order to make it more performant, you may need to reach a compromise between to conflicting expectations. I will present here a few ways to optimize your query, some of them involves some drastic changes, but eventually it is you or your client who decides how this needs to be improved. Also, some of the improvements might be ineffective, so do not take anything for granted, test everything. Without further ado, let's see the suggestions
The query
First I would try to change the query a little, maybe something like this could help you
SELECT
Material
,Stock
,Stock.PostingDate
,C1.InitDate
,C2.InitDate
,Stock.Client
FROM Stock
LEFT JOIN InitClient C1 ON Client = 1
LEFT JOIN InitClient C2 ON Client = 2
WHERE
(
(Stock.Client = 1 AND Stock.PostingDate > C1.InitDate) OR
(Stock.Client = 2 AND Stock.PostingDate > C2.InitDate)
)
Sometimes a simple step of getting rid of subselects does the trick
The indexes
You may want to speed up your process by creating indexes, for example on Stock.PostingDate.
Helper table
You can create a helper table where you store the Stock records' relevant data, so you perform the slow query ONCE in a while, maybe once in a week, or each time a new client enters the stage and store the results in the helper table. Once the prerequisite calculation is done, you will be able to query only the helper table with its few records, reaching lightning fast behavior. So, the idea is to execute the slow query rarely, cache/store the results and reuse them instead of calculating it every time.
A new column
You could create a column in your Stock table named InitDate and fill that with data for each record periodically. It will take a long while at the first execution, but then you will be able to query only the Stock table without joins and subselects.

Simplify multiple joins

I have a Claims table with 70 columns, 16 of which contain diagnosis codes. The codes mean nothing, so I need to pull the descriptions for each code located in a separate table.
There has to be a simpler way of pulling these code descriptions:
-- This is the claims table
FROM
[database].[schema].[claimtable] AS claim
-- [StagingDB].[schema].[Diagnosis] table where the codes located
-- [ICD10_CODE] column contains the code
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag1 ON claim.[ICDDiag1] = diag1.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag2 ON claim.[ICDDiag2] = diag2.[ICD10_CODE]
LEFT JOIN
[StagingDB].[schema].[Diagnosis] AS diag3 ON claim.[ICDDiag3] = diag3.[ICD10_CODE]
-- and so on, up to ....
LEFT JOIN
[StagingDB].[schema].[Diagnosis]AS diag16 ON claim.[ICDDiag16] = diag16.[ICD10_CODE]
-- reported column will be [code_desc]
-- ie:
-- diag1.[code_desc] AS Diagnosis1
-- diag2.[code_desc] AS Diagnosis2
-- diag3.[code_desc] AS Diagnosis3
-- diag4.[code_desc] AS Diagnosis4
-- etc.
I think what you are doing is already correct in given scenario.
Another way can be from programming point of view or you can give try and compare ther performace.
i) Pivot Claim table on those 16 description columns.
ii) Join the Pivoted column with [StagingDB].[schema].[Diagnosis]
Another way can be to put [StagingDB].[schema].[Diagnosis] table in some #temp table
instead of touching large Staging 16 times.
But for data analysis has to be done to decide if there is any way.
You can go for UNPIVOT of the claimTable and then join with Diagnosis table.
TEST SETUP
create table #claimTable(ClaimId INT, Diag1 VARCHAR(10), Diag2 VARCHAR(10))
CREATE table #Diagnosis(code VARCHAR(10), code_Desc VARCHAR(255))
INSERT INTO #ClaimTable
VALUES (1, 'Fever','Cold'), (2, 'Headache','toothache');
INSERT INTO #Diagnosis
VALUEs ('Fever','Fever Desc'), ('cold','cold desc'),('headache','headache desc'),('toothache','toothache desc');
Query to Run
;WITH CTE_Claims AS
(SELECT ClaimId,DiagnosisNumeral, code
FROM #claimTable
UNPIVOT
(
code FOR DiagnosisNumeral in ([Diag1],[Diag2])
) as t
)
SELECT c.ClaimId, c.code, d.code_Desc
FROM CTE_Claims AS c
INNER JOIN #Diagnosis as d
on c.code = d.code
ResultSet
+---------+-----------+----------------+
| ClaimId | code | code_Desc |
+---------+-----------+----------------+
| 1 | Fever | Fever Desc |
| 1 | Cold | cold desc |
| 2 | Headache | headache desc |
| 2 | toothache | toothache desc |
+---------+-----------+----------------+

Lookup delimited values in a table in sql-server

In a table A i have a column (varchar*30) city-id with the value e.g. 1,2,3 or 2,4.
The description of the value is stored in another table B, e.g.
1 Amsterdam
2 The Hague
3 Maastricht
4 Rotterdam
How must i join table A with table B to get the descriptions in one or maybe more rows?
Assuming this is what you meant:
Table A:
id
-------
1
2
3
Table B:
id | Place
-----------
1 | Amsterdam
2 | The Hague
3 | Maastricht
4 | Rotterdam
Keep id column in both tables as auto increment, and PK.
Then just do a simple inner join.
select * from A inner join B on (A.id = B.id);
Ideal way to deal with such scenarios is to have a normalized table as Collin. In case that can't be done here is the way to go about -
You would need to use a table-valued function to split the comma-seperated value. If you are having SQL-Server 2016, there is a built-in SPLIT_STRING function, if not you would need to create one as shown in this link.
create table dbo.sCity(
CityId varchar(30)
);
create table dbo.sCityDescription(
CityId int
,CityDescription varchar(30)
);
insert into dbo.sCity values
('1,2,3')
,('2,4');
insert into dbo.sCityDescription values
(1,'Amsterdam')
,(2,'The Hague')
,(3,'Maastricht')
,(4,'Rotterdam');
select ctds.CityDescription
,sst.Value as 'CityId'
from dbo.sCity ct
cross apply dbo.SplitString(CityId,',') sst
join dbo.sCityDescription ctds
on sst.Value = ctds.CityId;

SQLite query with unknown foreign key

I am playing around with a SQLite database in a vb.net application. The database is supposed to store time series data for many variables.
Right now I am trying to build the database with 2 tables as followed:
Table varNames:
CREATE TABLE IF NOT EXISTS varNames(id INTEGER PRIMARY KEY, varName TEXT UNIQUE);
It looks like this:
ID | varName
---------------
1 | var1
2 | var2
... | ...
Table varValues:
CREATE TABLE IF NOT EXISTS varValues(timestamp INTEGER, varValue FLOAT, id INTEGER, FOREIGN KEY(id) REFERENCES varNames(id) ON DELETE CASCADE);
It looks like this:
timestamp | varValue | id
------------------------------
1 | 1.0345 | 1
4 | 3.5643 | 1
1 | 7.7866 | 2
3 | 4.5668 | 2
... | .... | ...
The first table contains all variable names with IDs. The second table contains the values of each variable for many time steps (indicated by the timestamps). A foreign key links the tables through the variable IDs.
Building up the database works fine.
Now I want to query the database and plot the time series for selected variables. For this I use the following statement:
select [timestamp], [varValue] FROM varValues WHERE (SELECT id from varNames WHERE varName= '" & NAMEvariable & "');
Since the user does not know the variabel ID, only the name of the Variable (in NAMEvariable) I use the ..WHERE (SELECT... statement. It seems like this really slows down the performance. The time series have up to 50k points.
Is there any better way to query values for a specific variable which can only be addressed by its name?
You probably should use a join query, something like:
SELECT a.[timestamp], a.varValue
FROM varValues AS a, varNames AS b
WHERE b.varName = <name>
AND a.id = b.ID
edit: To query for more than one parameter, use something like this:
SELECT a.[timestamp], a.varValue
FROM varValues AS a, varNames AS b
WHERE b.varName IN (<name1>, <name2>, ...)
AND a.id = b.ID

Set-based approach to updating multiple tables, rather than a WHILE loop?

Apparently I'm far too used to procedural programming, and I don't know how to handle this with a set-based approach.
I have several temporary tables in SQL Server, each with thousands of records. Some of them have tens of thousands of records each, but they're all part of a record set. I'm basically loading a bunch of xml data that looks like this:
<root>
<entry>
<id-number>12345678</id-number>
<col1>blah</col1>
<col2>heh</col2>
<more-information>
<col1>werr</col1>
<col2>pop</col2>
<col3>test</col3>
</more-information>
<even-more-information>
<col1>czxn</col1>
<col2>asd</col2>
<col3>yyuy</col3>
<col4>moat</col4>
</even-more-information>
<even-more-information>
<col1>uioi</col1>
<col2>qwe</col2>
<col3>rtyu</col3>
<col4>poiu</col4>
</even-more-information>
</entry>
<entry>
<id-number>12345679</id-number>
<col1>bleh</col1>
<col2>sup</col2>
<more-information>
<col1>rrew</col1>
<col2>top</col2>
<col3>nest</col3>
</more-information>
<more-information>
<col1>234k</col1>
<col2>fftw</col2>
<col3>west</col3>
</more-information>
<even-more-information>
<col1>asdj</col1>
<col2>dsa</col2>
<col3>mnbb</col3>
<col4>boat</col4>
</even-more-information>
</entry>
</root>
Here's a brief display of what the temporary tables look like:
Temporary Table 1 (entry):
+------------+--------+--------+
| UniqueID | col1 | col2 |
+------------+--------+--------+
| 732013 | blah | heh |
| 732014 | bleh | sup |
+------------+--------+--------+
Temporary Table 2 (more-information):
+------------+--------+--------+--------+
| UniqueID | col1 | col2 | col3 |
+------------+--------+--------+--------+
| 732013 | werr | pop | test |
| 732014 | rrew | top | nest |
| 732014 | 234k | ffw | west |
+------------+--------+--------+--------+
Temporary Table 3 (even-more-information):
+------------+--------+--------+--------+--------+
| UniqueID | col1 | col2 | col3 | col4 |
+------------+--------+--------+--------+--------+
| 732013 | czxn | asd | yyuy | moat |
| 732013 | uioi | qwe | rtyu | poiu |
| 732014 | asdj | dsa | mnbb | boat |
+------------+--------+--------+--------+--------+
I am loading this data from an XML file, and have found that this is the only way I can tell which information belongs to which record, so every single temporary table has the following inserted at the top:
T.value('../../id-number[1]', 'VARCHAR(8)') UniqueID,
As you can see, each temporary table has a UniqueID assigned to it's particular record to indicate that it belongs to the main record. I have a large set of items in the database, and I want to update every single column in each non-temporary table using a set-based approach, but it must be restricted by UniqueID.
In tables other than the first one, there is a Foreign_ID based on the PrimaryKey_ID of the main table, and the UniqueID will not be inserted... it's just to help tell what goes where.
Here's the exact logic that I'm trying to figure out:
If id-number currently exists in the main table, update tables based on the PrimaryKey_ID number of the main table, which is the same exact number in every table's Foreign_ID. The foreign-key'd tables will have a totally different number than the id-number -- they are not the same.
If id-number does not exist, insert the record. I have done this part.
However, I'm currently stuck in the mind-set that I have to set temporary variables, such as #IDNumber, and #ForeignID, and then loop through it. Not only am I getting multiple results instead of the current result, but everyone says WHILE shouldn't be used, especially for such a large volume of data.
How do I update these tables using a set-based approach?
Assuming you already have this XML extracted, you could do something similar to:
UPDATE ent
SET ent.col1 = tmp1.col1,
ent.col2 = tmp1.col2
FROM dbo.[Entry] ent
INNER JOIN #TempEntry tmp1
ON tmp1.UniqueID = ent.UniqueID;
UPDATE mi
SET mi.col1 = tmp2.col1,
mi.col2 = tmp2.col2,
mi.col3 = tmp2.col3
FROM dbo.[MoreInformation] mi
INNER JOIN dbo.[Entry] ent -- mapping of Foreign_ID ->UniqueID
ON ent.PrimaryKey_ID = mi.Foreign_ID
INNER JOIN #TempMoreInfo tmp2
ON tmp2.UniqueID = ent.UniqueID
AND tmp2.SomeOtherField = mi.SomeOtherField; -- need 1 more field
UPDATE emi
SET ent.col1 = tmp3.col1,
emi.col2 = tmp3.col2,
emi.col3 = tmp3.col3,
emi.col4 = tmp3.col4
FROM dbo.[EvenMoreInformation] emi
INNER JOIN dbo.[Entry] ent -- mapping of Foreign_ID ->UniqueID
ON ent.PrimaryKey_ID = mi.Foreign_ID
INNER JOIN #TempEvenMoreInfo tmp3
ON tmp3.UniqueID = ent.UniqueID
AND tmp3.SomeOtherField = emi.SomeOtherField; -- need 1 more field
Now, I should point out that if the goal is truly to
update every single column in each non-temporary table
then there is a conceptual issue for any sub-tables that have multiple records. If there is no record in that table that will remain the same outside of the Foreign_ID field (and I guess the PK of that table?), then how do you know which row is which for the update? Sure, you can find the correct Foreign_ID based on the UniqueID mapping already in the non-temporary Entry table, but there needs to be at least one field that is not an IDENTITY (or UNIQUEIDENTIFIER populated via NEWID or NEWSEQUENTIALID) that will be used to find the exact row.
If it is not possible to find a stable, matching field, then you have no choice but to do a wipe-and-replace method instead.
P.S. I used to recommend the MERGE command but have since stopped due to learning of all of the bugs and issues with it. The "nicer" syntax is just not worth the potential problems. For more info, please see Use Caution with SQL Server's MERGE Statement.
you can use MERGE which does upsert ( update and insert ) in a single statement
First merge entries to the main table
For other tables, you can do a join with main table to get foreign id mapping
MERGE Table2 as Dest
USING ( select t2.*, m.primaryKey-Id as foreign_ID
from #tempTable2 t2
join mainTable m
on t2.id-number = m.id-number
) as Source
on Dest.Foreign_ID = m.foreign_ID
WHEN MATCHED
THEN Update SET Dest.COL1 = Source.Col1
WHEN NOT MATCHED then
INSERT (FOREGIN_ID, col1, col2,...)
values ( src.foreign_Id, src.col1, src.col2....)

Resources