I could not find anything online about this problem. It seems ORACLE has a similar issue, which is handled using DUAL, as suggested in this StackOverflow answer.
But how do we do the same in Exasol? According to their documentation, you need to MERGE using a secondary table.
I tried to do the same approach as on Oracle, to no avail:
MERGE INTO TEST.TABLE USING SYS.DUAL ON "COLUMN_1" = "foo"
WHEN MATCHED THEN UPDATE SET "COLUMN_2" = "quux"
WHEN NOT MATCHED THEN INSERT ("COLUMN_1", "COLUMN_2") VALUES ("foo", "bar")
[2021-12-15 11:22:47] [0A000] Feature not supported: Merge using a system table as source that is no view. (Session: 1719203050222845952)
Is it not possible to UPSERT like in other RDBMSs?
You can rewrite the query like this:
MERGE INTO TEST.TABLE
USING (select 'foo' as s) s ON COLUMN_1 = s
WHEN MATCHED THEN UPDATE SET COLUMN_2 = 'quux'
WHEN NOT MATCHED THEN INSERT (COLUMN_1, COLUMN_2) VALUES ('foo', 'bar')
This way you will eliminate DUAL table and keep correct join clause.
Related
I was facing issues with merge statement over large tables.
The source table for merge is basically clone of the target table after applying some DML.
e.g. In the below example PUBLIC.customer is target and STAGING.customer is the source.
CREATE OR REPLACE TABLE STAGING.customer CLONE PUBLIC.customer;
MERGE INTO STAGING.customer TARGET USING (SELECT * FROM NEW_CUSTOMER) AS SOURCE ON TARGET.ID = SOURCE.ID
WHEN MATCHED AND SOURCE.DELETEFLAG=TRUE THEN DELETE
WHEN MATCHED AND TARGET.ROWMODIFIED < SOURCE.ROWMODIFIED THEN UPDATE SET TARGET.AGE = SOURCE.AGE, ...
WHEN NOT MATCHED THEN INSERT (AGE) VALUES (AGE, DELETEFLAG, ID,...);
Currently, we are simply merging the STAGING.customer back to PUBLIC.customer at the end.
This final merge statement is very costly for some of the large tables.
While looking for a solution to reduce the cost, I discovered Snowflake "CHANGES" mechanism. As per the documentation,
Currently, at least one of the following must be true before change tracking metadata is recorded for a table:
Change tracking is enabled on the table (using ALTER TABLE … CHANGE_TRACKING = TRUE).
A stream is created for the table (using CREATE STREAM).
Both options add hidden columns to the table which store change tracking metadata. The columns consume a small amount of storage.
I assumed that the metadata added to the table is equivalent to the result-set of the select statement using "changes" clause, which doesn't seem to be the case.
INSERT INTO PUBLIC.CUSTOMER(AGE,...) (SELECT AGE,... FROM STAGING.CUSTOMER CHANGES (information => default) at(timestamp => 1675772176::timestamp) where "METADATA$ACTION" = 'INSERT' );
The select statement using "changes" clause is way slower than the merge statement that I am using currently.
I checked the execution plan and found that Snowflake performs a self-join(sort of) on the table at two different timestamp.
Should it really be the behaviour or am I missing something here? I was hoping to get better performance assuming to scan the table one time and then simply inserting the new records which should be faster than the merge statement.
Also, even if it does a self join, why does the merge query perform better than this, the merge query is also doing join on similar volumes.
I was also hoping to use same mechanism for delete/updates on source table.
I have a table that I need to add the same values to a whole bunch of items
(in a nut shell if the item doesn't have a UNIT of "CTN" I want to add the same values i have listed to them all)
I thought the following would work but it doesn't :(
Any idea what i am doing wrong ?
INSERT INTO ICUNIT
(UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
VALUES ('CTN','20220509','22513927','ADMIN','AU','1')
WHERE ITEMNO In '0','etc','etc','etc'
If I understand correctly you might want to use INSERT INTO ... SELECT from original table with your condition.
INSERT INTO ICUNIT (UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
SELECT 'CTN','20220509','22513927','ADMIN','AU','1'
FROM ICUNIT
WHERE ITEMNO In ('0','etc','etc','etc')
The query you needs starts by selecting the filtered items. So it seems something like below is your starting point
select <?> from dbo.ICUNIT as icu where icu.UNIT <> 'CTN' order by ...;
Notice the use of schema name, terminators, and table aliases - all best practices. I will guess that a given "item" can have multiple rows in this table so long as ICUNIT is unique within ITEMNO. Correct? If so, the above query won't work. So let's try slightly more complicated filtering.
select distinct icu.ITEMNO
from dbo.ICUNIT as icu
where not exists (select * from dbo.ICUNIT as ctns
where ctns.ITEMNO = icu.ITEMNO -- correlating the subquery
and ctns.UNIT = 'CTN')
order by ...;
There are other ways to do that above but that is one common way. That query will produce a resultset of all ITEMNO values in your table that do not already have a row where UNIT is "CTN". If you need to filter that for specific ITEMNO values you simply adjust the WHERE clause. If that works correctly, you can use that with your insert statement to then insert the desired rows.
insert into dbo.ICUNIT (...)
select distinct icu.ITEMNO, 'CTN', '20220509', '22513927', 'ADMIN', 'AU', '1'
from ...
;
So this is a continuation of post:
Best way to get identity of inserted row?
That post proposes, and I agree, to use Inserted feature to safely return inserted id column(s).
While implementing this feature, it seems SqlClient of the .net framework does not support this feature, and fails while trying to execute command, I get the following exception:
System.Data.SqlClient.SqlException: 'Cannot find either column "INSERTED" or the user-defined function or aggregate "INSERTED.Id", or the name is ambiguous.'
I'm just using:
return (T)command.ExecuteScalar();
Where the query is:
INSERT INTO MyTable
OUTPUT INSERTED.Id
(Description)
VALUES (#Description)
And the table just contains
ID (identity int)
Description (varchar(max))
If impossible to do, is there other safe way without using variables in the middle that might affect performance?
Thanks
You are doing everything correctly, but you have misplaced the OUTPUT clause: it goes after the list of columns and before the VALUES, i.e.
INSERT INTO MyTable (Description)
OUTPUT INSERTED.Id
VALUES (#Description)
I am very new to SQL and SQL server, would appreciate any help with the following problem.
I am trying to update a share price table with new prices.
The table has three columns: share code, date, price.
The share code + date = PK
As you can imagine, if you have thousands of share codes and 10 years' data for each, the table can get very big. So I have created a separate table called a share ID table, and use a share ID instead in the first table (I was reliably informed this would speed up the query, as searching by integer is faster than string).
So, to summarise, I have two tables as follows:
Table 1 = Share_code_ID (int), Date, Price
Table 2 = Share_code_ID (int), Share_name (string)
So let's say I want to update the table/s with today's price for share ZZZ. I need to:
Look for the Share_code_ID corresponding to 'ZZZ' in table 2
If it is found, update table 1 with the new price for that date, using the Share_code_ID I just found
If the Share_code_ID is not found, update both tables
Let's ignore for now how the Share_code_ID is generated for a new code, I'll worry about that later.
I'm trying to use a merge query loosely based on the following structure, but have no idea what I am doing:
MERGE INTO [Table 1]
USING (VALUES (1,23-May-2013,1000)) AS SOURCE (Share_code_ID,Date,Price)
{ SEEMS LIKE THERE SHOULD BE AN INNER JOIN HERE OR SOMETHING }
ON Table 2 = 'ZZZ'
WHEN MATCHED THEN UPDATE SET Table 1.Price = 1000
WHEN NOT MATCHED THEN INSERT { TO BOTH TABLES }
Any help would be appreciated.
http://msdn.microsoft.com/library/bb510625(v=sql.100).aspx
You use Table1 for target table and Table2 for source table
You want to do action, when given ID is not found in Table2 - in the source table
In the documentation, that you had read already, that corresponds to the clause
WHEN NOT MATCHED BY SOURCE ... THEN <merge_matched>
and the latter corresponds to
<merge_matched>::=
{ UPDATE SET <set_clause> | DELETE }
Ergo, you cannot insert into source-table there.
You could use triggers for auto-insertion, when you insert something in Table1, but that will not be able to insert proper Shared_Name - trigger just won't know it.
So you have two options i guess.
1) make T-SQL code block - look for Stored Procedures. I think there also is a construct to execute anonymous code block in MS SQ, like EXECUTE BLOCK command in Firebird SQL Server, but i don't know it for sure.
2) create updatable SQL VIEW, joining Table1 and Table2 to show last most current date, so that when you insert a row in this view the view's on-insert trigger would actually insert rows to both tables. And when you would update the data in the view, the on-update trigger would modify the data.
This is probably a very simple question for you SQL folks out there.
I have a temp table (TMP_VALIDATION_DATA) in which I've stored the old and new values of some fields I wish to update in a production table (PROVIDER_SERVICE), plus the uuids of the PROVIDER_SERVICE records that need to be updated.
What I want to accomplish is this, in pseudo-code:
For every prov_svc_uuid uuid in TMP_VALIDATION_DATA table
Set PROVIDER_SERVICE_RATE.END_DATE = NewPvSvcEndDate
Where [uuid in temp table] = [uuid in PROVIDER_SERVICE table]
end for
Is this Update statement going to accomplish what I need?
update PROVIDER_SERVICE
set END_DATE = (
select NewPvSvcEndDate
from TMP_VALIDATION_DATA T
where T.PROVIDER_SERVICE_UUID = PROVIDER_SERVICE.PROVIDER_SERVICE_UUID
)
If my UPDATE is incorrect, will you please provide the correction? Thanks.
Your query will update all records and you might get an error if you have more than one record in your subquery. I would also change your syntax to a JOIN similar to below.
update P
set END_DATE = T.NewPvSvcEndDate
FROM PROVIDER_SERVICE P
JOIN TMP_VALIDATION_DATA T
ON P.PROVIDER_SERVICE_UUID = T.PROVIDER_SERVICE_UUID
If you don't want to UPDATE all records, then add a WHERE clause.
My suggestion is if you don't know how many records would be included in the UPDATE, write your query as a SELECT first, then change it to an UPDATE. So for this one:
SELECT P.END_DATE, T.NewPvSvcEndDate
FROM PROVIDER_SERVICE P
JOIN TMP_VALIDATION_DATA T
ON P.PROVIDER_SERVICE_UUID = T.PROVIDER_SERVICE_UUID
This will either update all records, or error out (not sure what happens when you try to update a column with multiple values like that).