Dynamic INSERT script in SQL - sql-server

Is there a way to create dynamic INSERT scripts from a stored procedure in SQL?
I was asked to create an archive functionality where it is supposed to select a date range and create INSERT statements of all the rows of that range and navigate through all the foreign keys generating cascade INSERT scripts for those "dependency" tables too.
Is this near possible? I was researching about this and couldn't find anything.

If you are trying to do inserts from other tables dynamically, you could do this with combination of INSERT and SELECT statements. I am not sure if this is what you are after.
E.g.:
INSERT INTO Person_Archive(Id,Name, CountryId)
SELECT Id, Name, CountryId FROM Person
WHERE CreatedAt BETWEEN '2016-01-25 00:00:00' AND '2016-12-25 23:59:59'
This is the harder one. You only want to add Countries that dont exist into the Archive. Of course more ways to skin a cat here, this is just articulating the approach.
INSERT INTO Country_Archive(Id, Name) -- INSERT INTO THE archive
SELECT Country.CountryId, Country.Name FROM Person -- SELECT rows
JOIN Country ON CountryID.Id = Person.CountryId -- Join on Country to get Country Name
WHERE CreatedAt BETWEEN '2016-01-25 00:00:00' AND '2016-12-25 23:59:59'
-- Criteria for query
AND (SELECT COUNT(*) FROM Country_Archive WHERE Country_Archive.Id = Person.CountryID) = 0)
-- This checks to see if CountryId hasn't already been added
GROUP BY Country.CountryId, Country.Name -- This get unique records only, i.e. if multiple users have same country.

Related

How to use cross apply string split result to update a table in sql?

I am trying to split a column('categories') of a Table 'movies_titles' which has string separated data values in it.
e.g:
ID title categories
1 Movie A Comedy, Drama, Romance
2 Movie B Animation
3 Movie C Documentary, Life changing
I want to split the comma delimited string and place each values in a separate rows and update the table
-- this query shows the splitted strings as I want it
SELECT *
FROM dbo.movies_titles
CROSS APPLY
string_split(categories, ',')
O/P:
ID title categories value
1 Movie A Comedy, Drama, Romance Comedy
1 Movie A Comedy, Drama, Romance Drama
1 Movie A Comedy, Drama, Romance Romance
2 Movie B Animation Animation
3 Movie C Documentary, Life changing Documentary
3 Movie C Documentary, Life changing Life changing
I want to use UPDATE query to set the result obtained from value column. I just don't want to use SELECT query to view the result but permanently update the changes to the table. How do I achieve this in sql server?
You can do something similar to your intention creating new rows, because the update statement won't create the additional rows made by the split.
There can be issues if the ID column is unique, like a primary key, and there is the need to keep the title associated with that column.
I've created two scenarios on DB Fiddle, showing how you can do this using only one table as the question instructed, but a better alternative would be to save this information on another table.
This code on DB Fiddle: link
--Assuming your table is something like this
create table movies_id_as_pk (
ID int identity(1,1) primary key,
title varchar(200),
categories varchar(200),
category varchar(200)
)
--Or this
create table movies_other_pk (
another_id int identity(1,1) primary key,
ID int,
title varchar(200),
categories varchar(200),
category varchar(200)
)
--The example data
set identity_insert movies_id_as_pk on
insert into movies_id_as_pk (ID, title, categories) values
(1, 'Movie A', 'Comedy, Drama, Romance'),
(2, 'Movie B', 'Animation'),
(3, 'Movie C', 'Documentary, Life changing')
set identity_insert movies_id_as_pk off
insert into movies_other_pk (ID, title, categories)
select ID, title, categories from movies_id_as_pk
--You can't update directly any of the tables, because as the result of the split
--have more rows than the table, it would just leave the first value found:
update m set category = rtrim(ltrim(s.value))
from movies_id_as_pk m
cross apply string_split(m.categories, ',') as s
update m set category = rtrim(ltrim(s.value))
from movies_other_pk m
cross apply string_split(m.categories, ',') as s
select * from movies_id_as_pk
select * from movies_other_pk
--What you can do is create the aditional rows, inserting them:
--First, let's undo what the last instructions have changed
update movies_id_as_pk set category=NULL
update movies_other_pk set category=NULL
--Then use inserts to create the rows with the categories split
insert into movies_id_as_pk (title, category)
select m.title, rtrim(ltrim(s.value))
from movies_id_as_pk m
cross apply string_split(m.categories, ',') as s
insert into movies_other_pk (ID, title, category)
select m.ID, m.title, rtrim(ltrim(s.value))
from movies_other_pk m
cross apply string_split(m.categories, ',') as s
select * from movies_id_as_pk
select * from movies_other_pk
It actually is possible to insert or update at the same time. That is to say: we can update each row with a single category, then create new rows for the extra ones.
We can use MERGE for this. We can use the same table as source and target. We just need to split the source, then add a row-number partitioned per each original row. We then filter the ON clause to match only the first row.
WITH Source AS (
SELECT
m.ID,
m.title,
category = TRIM(cat.value),
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL))
FROM movies m
CROSS APPLY STRING_SPLIT(m.categories, ',') cat
)
MERGE movies t
USING Source s
ON s.ID = t.ID AND s.rn = 1
WHEN MATCHED THEN
UPDATE
SET categories = s.category
WHEN NOT MATCHED THEN
INSERT (ID, title, categories)
VALUES (s.ID, s.title, s.category)
;
db<>fiddle
I wouldn't necessarily recommend this as a general solution though, because it appears you actually have other normalization problems to sort out first. You should really have separate tables for all this information:
Movie
Category
MovieCategory

Create trigger to keep the latest record

I have a Product table which keeps on adding rows with product_id and price . It has millions of rows.
It has a product_id as Primary key like below.
CREATE TABLE ProductPrice(
product_id VARCHAR2(10),
prod_date DATE ,
price NUMBER(8,0) ,
PRIMARY KEY (product_id)
)
Now this has millions of rows and to get the latest price it get a lot of time.
So to manage the latest price, I have created another table which will keep only the latest price with same format.
CREATE TABLE ProductPriceLatest(
product_id VARCHAR2(10),
prod_date DATE ,
price NUMBER(8,0) ,
PRIMARY KEY (product_id)
)
And on every insert on original table, i will write a trigger which will update the row in this table.
But how can i get the newly inserted values inside the trigger body?
I have tried something like this:
CREATE OR REPLACE TRIGGER TRIG_HISTory
AFTER INSERT
on ProductPriceLatest
FOR EACH ROW
DECLARE
BEGIN
UPDATE latest_price
SET price = NEW.price ,
WHERE product_id = NEW.product_id ;
END;
Thanks in advance.
You need to use the :new keyword to differentiate with :old values. Also, better use AFTER trigger:
CREATE OR REPLACE TRIGGER TRIG_HISTORY
AFTER INSERT ON source_table_name
FOR EACH ROW
DECLARE
BEGIN
MERGE INTO dest_table_name d
USING (select :new.price p, :new.product_id p_id from dual) s
ON (d.product_id = s.p_id)
WHEN MATCHED THEN
UPDATE SET d.price = s.p
WHEN NOT MATCHED THEN
INSERT (price, product_id)
VALUES (s.p, s.p_id);
END;
Retrieving the latest price from your first table should be fast if you have the correct index. Building the correct index on your ProductPrice table is a far better solution to your problem than trying to maintain a separate table.
Your query to get the latest prices would look like this.
SELECT p.product_id, p.prod_date, p.price
FROM ProductPrice p
JOIN (
SELECT product_id, MAX(prod_date) latest_prod_date
FROM ProductPrice
GROUP BY product_id
) m ON p.product_id = m.product_id
AND p.prod_date = m.latest_prod_date
WHERE p.product_id = ????
This works because the subquery looks up the latest product date for each product. It then uses that information to find the right row in the table to show you.
If you create a compound index on (product_id, prod_date, price) this query will run almost miraculously fast. That's because the query planner can find the correct index item in O(log n) time or better.
You can make it into a view like this:
CREATE OR REPLACE VIEW ProductPriceLatest AS
SELECT p.product_id, p.prod_date, p.price
FROM ProductPrice p
JOIN (
SELECT product_id, MAX(prod_date) latest_prod_date
FROM ProductPrice
GROUP BY product_id
) m ON p.product_id = m.product_id
AND p.prod_date = m.latest_prod_date;
Then you can use the view like this:
SELECT * FROM ProductPriceLatest WHERE product_id = ???
and get the same high performance.
This is easier, less error-prone, and just as fast as creating a separate table and maintaining it. By the way, DBMS jargon for the table you propose to create is materialized view.

How to delete Duplicate records in snowflake database table

how to delete the duplicate records from snowflake table. Thanks
ID Name
1 Apple
1 Apple
2 Apple
3 Orange
3 Orange
Result should be:
ID Name
1 Apple
2 Apple
3 Orange
Adding here a solution that doesn't recreate the table. This because recreating a table can break a lot of existing configurations and history.
Instead we are going to delete only the duplicate rows and insert a single copy of each, within a transaction:
-- find all duplicates
create or replace transient table duplicate_holder as (
select $1, $2, $3
from some_table
group by 1,2,3
having count(*)>1
);
-- time to use a transaction to insert and delete
begin transaction;
-- delete duplicates
delete from some_table a
using duplicate_holder b
where (a.$1,a.$2,a.$3)=(b.$1,b.$2,b.$3);
-- insert single copy
insert into some_table
select *
from duplicate_holder;
-- we are done
commit;
Advantages:
Doesn't recreate the table
Doesn't modify the original table
Only deletes and inserts duplicated rows (good for time travel storage costs, avoids unnecessary reclustering)
All in a transaction
If you have some primary key as such:
CREATE TABLE fruit (key number, id number, name text);
insert into fruit values (1,1, 'Apple'), (2,1,'Apple'),
(3,2, 'Apple'), (4,3, 'Orange'), (5,3, 'Orange');
as then
DELETE FROM fruit
WHERE key in (
SELECT key
FROM (
SELECT key
,ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY key) AS rn
FROM fruit
)
WHERE rn > 1
);
But if you do not have a unique key then you cannot delete that way. At which point a
CREATE TABLE new_table_name AS
SELECT id, name FROM (
SELECT id
,name
,ROW_NUMBER() OVER (PARTITION BY id, name) AS rn
FROM table_name
)
WHERE rn > 1
and then swap them
ALTER TABLE table_name SWAP WITH new_table_name
Here's a very simple approach that doesn't need any temporary tables. It will work very nicely for small tables, but might not be the best approach for large tables.
insert overwrite into some_table
select distinct * from some_table
;
The OVERWRITE keyword means that the table will be truncated before the insert takes place.
Snowflake does not have effective primary keys, their use is primarily with ERD tools.
Snowflake does not have something like a ROWID either, so there is no way to identify duplicates for deletion.
It is possible to temporarily add a "is_duplicate" column, eg. numbering all the duplicates with the ROW_NUMBER() function, and then delete all records with "is_duplicate" > 1 and finally delete the utility column.
Another way is to create a duplicate table and swap, as others have suggested.
However, constraints and grants must be kept. One way to do this is:
CREATE TABLE new_table LIKE old_table COPY GRANTS;
INSERT INTO new_table SELECT DISTINCT * FROM old_table;
ALTER TABLE old_table SWAP WITH new_table;
The code above removes exact duplicates. If you want to end up with a row for each "PK" you need to include logic to select which copy you want to keep.
This illustrates the importance to add update timestamp columns in a Snowflake Data Warehouse.
this has been bothering me for some time as well. As snowflake has added support for qualify you can now create a dedupped table with a single statement without subselects:
CREATE TABLE fruit (id number, nam text);
insert into fruit values (1, 'Apple'), (1,'Apple'),
(2, 'Apple'), (3, 'Orange'), (3, 'Orange');
CREATE OR REPLACE TABLE fruit AS
SELECT * FROM
fruit
qualify row_number() OVER (PARTITION BY id, nam ORDER BY id, nam) = 1;
SELECT * FROM fruit;
Of course you are left with a new table and loose table history, primary keys, foreign keys and such.
Based on above ideas.....following query worked perfectly in my case.
CREATE OR REPLACE TABLE SCHEMA.table
AS
SELECT
DISTINCT *
FROM
SCHEMA.table
;
Your question boils down to: How can I delete one of two perfectly identical rows? . You can't. You can only do a DELETE FROM fruit where ID = 1 and Name = 'Apple';, then both rows will go away. Or you don't, and keep both.
For some databases, there are workarounds using internal rows, but there isn't any in snowflake, see https://support.snowflake.net/s/question/0D50Z00008FQyGqSAL/is-there-an-internalmetadata-unique-rowid-in-snowflake-that-i-can-reference . You cannot limit deletes, either, so your only option is to create a new table and swap.
Additional Note on Hans Henrik Eriksen's remark on the importance of update timestamps: This is a real help when the duplicates where added later. If, for example, you want to keep the newer values, you can then do this:
-- setup
create table fruit (ID Integer, Name VARCHAR(16777216), "UPDATED_AT" TIMESTAMP_NTZ);
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (2, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- wait > 1 nanosecond
insert into fruit values (1, 'Apple', CURRENT_TIMESTAMP::timestamp_ntz)
, (3, 'Orange', CURRENT_TIMESTAMP::timestamp_ntz);
-- delete older duplicates (DESC)
DELETE FROM fruit
WHERE (ID
, UPDATED_AT) IN (
SELECT ID
, UPDATED_AT
FROM (
SELECT ID
, UPDATED_AT
, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY UPDATED_AT DESC) AS rn
FROM fruit
)
WHERE rn > 1
);
simple UNION eliminate duplicates on use case of just all columns/no pks.
anyway problem should he solved as early on ingestion pipeline, and/or use scd etc.
Just a raw magic best way how to delete is wrong in principle, use scd with high resolution timestamp, solves any problem.
you want fix massive dups load ? then add column like batch id and remove all batch loaded records
Its like being healthy, you have 2 approaches:
eat a lot > get far > go-to a gym to burn it
eat well > have healthy life style and no need for gym.
So before discussing best gym, try change life style.
hope this helps, learn to do pressure upstream on data producers instead of living like jesus christ trying to clean up the mess of everyone.
The following solution is effective if you are looking at one or few columns as primary key references for the table.
-- Create a temp table to hold our duplicates (only second occurrence)
CREATE OR REPLACE TRANSIENT TABLE temp_table AS (
SELECT [col1], [col2], .. [coln]
FROM (
SELECT *, ROW_NUMBER () OVER(
PARTITION BY [pk]1, [pk]2, .. [pk]m
ORDER BY [pk]1, [pk]2, .. [pk]m) AS duplicate_count
FROM [schema].[table]
) WHERE duplicate_count = 2
);
-- Delete all the duplicate records from the table
DELETE FROM [schema].[table] t1
USING temp_table t2
WHERE
t1.[pk]1 = t2.[pk]1 AND
t1.[pk]2 = t2.[pk]2 AND
..
t1.[pk]n = t2.[pk]m;
-- Insert single copy using the temp_table in the original table
INSERT INTO [schema].[table]
SELECT *
FROM temp_table;
This is inspired by #Felipe Hoffa's answer:
##create table with dupes and take the max id
create or replace transient table duplicate_holder as (
select max(S.ID) ID, some_field, count(some_field) numberAssets
from some_table S
group by some_field
having count(some_field)>1
)
##join back to the original table on the field excluding the ID in the duplicate table and delete.
delete from some_table as t
USING duplicate_holder as d
WHERE t.some_field=d.some_field
and t.id <> d.id
Not sure if people are still interested in this but I've used the below query which is more elegant and seems to have worked
create or replace table {{your_table}} as
select * from {{your_table}}
qualify row_number() over (partition by {{criteria_columns}} order by 1) = 1

What is wrong with my trigger? No results are inserted

The trigger below select ID's from one table (employeeInOut), sums int's in a column in that table matching all ID's, and is supposed to insert these in another table (monthlyHours). I can't figure out if this is a syntax problem (nothing shows up in intellisense), and all it says is trigger executed successfully - nothing is inserted.
Trigger ->
GO
CREATE TRIGGER empTotalsHoursWorked
ON employeeInOut
FOR INSERT, DELETE, UPDATE
AS
BEGIN
INSERT INTO monthlyHours(employeeID, monthlyHours)
SELECT (SELECT employeeID FROM employeeInOut),
SUM(dailyHours) AS monthlyHours
FROM employeeInOut
WHERE employeeInOut.employeeID=(SELECT employeeID FROM monthlyHours)
END
GO
I have re-worked this trigger many times and this is the one with no errors, however nothing is inserted, and results seem to be nothing. Any advice, answers please appreciated.
Going with a couple of assumptions here one being that monthlyHours table contains employeeID and monthlyhours.
With that being said I think you are going to need multiple triggers depending on the action. Below is an example based on insert into the employeeInOut table
GO
CREATE TRIGGER empTotalsHoursWorked
ON employeeInOut
AFTER INSERT
AS
BEGIN
DECLARE #employeeID INT
DECLARE #monthlyHours INT
SELECT #employeeID = INSERTED.employeeID
FROM INSERTED
SELECT #monthlyHours = SUM(dailyHours)
FROM employeeInOut
WHERE employeeInOut.employeeID = #employeeID
INSERT INTO monthlyHours(employeeID,monthlyHours)
values (#employeeID, #monthlyHours)
END
GO
This will insert a new row of course. You may want to modify this to update the row if the row already exists in the monthlyHours table for that employee.
I would really advise against a trigger for a simple running total like this, your best option would be to create a view. Something like:
CREATE VIEW dbo.MonthlyHours
AS
SELECT EmployeeID,
monthlyHours = SUM(dailyHours)
FROM dbo.employeeInOut
GROUP BY EmployeeID;
GO
Then you can access it in the same way as your table:
SELECT *
FROM dbo.MonthlyHours;
If you are particularly worried about performance, then you can always index the view:
CREATE VIEW dbo.MonthlyHours
WITH SCHEMABINDING
AS
SELECT EmployeeID,
monthlyHours = SUM(dailyHours),
RecordCount = COUNT_BIG(*)
FROM dbo.employeeInOut
GROUP BY EmployeeID;
GO
CREATE UNIQUE CLUSTERED INDEX UQ_MonthlyHours__EmployeeID ON dbo.MonthlyHours(EmployeeID);
Now whenever you add or remove records from employeeInOut SQL Server will automatically update the clustered index for the view, you just need to use the WITH (NOEXPAND) query hint to ensure that you aren't running the query behind the view:
SELECT *
FROM dbo.MonthlyHours WITH (NOEXPAND);
Finally, based on the fact the table is called monthly hours, I am guessing it should be by month, as such I assume you also have a date field in employeeInOut, in which case your view might be more like:
CREATE VIEW dbo.MonthlyHours
WITH SCHEMABINDING
AS
SELECT EmployeeID,
FirstDayOfMonth = DATEADD(MONTH, DATEDIFF(MONTH, 0, [YourDateField]), 0),
monthlyHours = SUM(dailyHours),
RecordCount = COUNT_BIG(*)
FROM dbo.employeeInOut
GROUP BY EmployeeID, DATEADD(MONTH, DATEDIFF(MONTH, 0, [YourDateField]), 0);
GO
CREATE UNIQUE CLUSTERED INDEX UQ_MonthlyHours__EmployeeID_FirstDayOfMonth
ON dbo.MonthlyHours(EmployeeID, FirstDayOfMonth);
And you can use the view in the same way described above.
ADDENDUM
For what it is worth, for your trigger to work properly you need to consider all cases:
Inserting a record where that employee already exists in MonthlyHours (Update existing).
Inserting a record where that employee does not exist in MonthlyHours (insert new).
Updating a record (update existing)
Deleting a record (update existing, or delete)
To handle all of these cases you can use MERGE:
CREATE TRIGGER empTotalsHoursWorked
ON employeeInOut
FOR INSERT, DELETE, UPDATE
AS
BEGIN
WITH ChangesToMake AS
( SELECT EmployeeID, SUM(dailyHours) AS MonthlyHours
FROM ( SELECT EmployeeID, dailyHours
FROM Inserted
UNION ALL
SELECT EmployeeID, -dailyHours
FROM deleted
) AS t
GROUP BY EmployeeID
)
MERGE INTO monthlyHours AS m
USING ChangesToMake AS c
ON c.EmployeeID = m.EmployeeID
WHEN MATCHED THEN UPDATE
SET MonthlyHours = c.MonthlyHours
WHEN NOT MATCHED BY TARGET THEN
INSERT (EmployeeID, MonthlyHours)
VALUES (c.EmployeeID, c.MonthlyHours)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
END
GO

Create trigger to only allow updates for certain employees

We have a database with one table for all of our Employee information, (name, pay, etc). One of the columns in the table is "commission." I am trying to write a trigger that will only allow the "commission" column be updated or inserted into if it is for a Sales Representative. If an update is tried on any other employee then it should print out an error. I would like this trigger to also print all of the information from each update, failed or not, to a separate table. What is the best way for me to go about doing this?
I am relatively new to SQL Server so any help here would be greatly appreciated!
Thanks,
EDIT:
Here is what i have so far:
CREATE TRIGGER CommissionUpdate ON Employees
FOR UPDATE
AS IF UPDATE(Commission)
Declare
#Old_Comm money
, #New_Comm money
, #EmpID int
Select #EmpID = (Select EmployeeID From Deleted)
Select #Old_Comm = (Select Commission From Deleted)
Select #New_Comm = (Select Commission From Inserted)
BEGIN
INSERT INTO ChangeLog (EmpID, [User], [Date], OldComm, NewComm)
VALUES (#EmpID, User_Name(), GetDate(), #Old_Comm, #New_Comm)
END
Basically all this does is add entries to the ChangeLog table when the commission table is updated. Im still having trouble adding the constraints to only allow "commission" to be updated for Sales Reps.

Resources