SQL Server : insert identity - sql-server

I have 2 tables ProductSize and Product.
Product table:
ProductID ProductCode ProductSizeID
Product Size table:
ProductSizeID PackperCase ItemsperCase ItemSize
I have a stored procedure which populates both these tables, however I can't populate the Products table without populating the Product Size table (as I need the productsizeID).
How can I tackle this?
I need something which shows the last ID I just inserted into the Productsize table. The stored procedure inserts 1 record at a time, but I don't really want to use MAX() to get the ID because other things may be going on in the database.

You need to first insert the row into ProductSize and get the ID back:
INSERT INTO dbo.ProductSize(PackPerCase, ItemsPerCase, ItemSize)
VALUES( ....., ....., .....);
DECLARE #NewID INT = SCOPE_IDENTITY();
and then use that in your second insert into the Products table:
INSERT INTO dbo.Products(ProductCode, ProductSizeID)
VALUES( ....., #NewID);
and you're done!

Related

Snowflake - how to do multiple DML operations on same primary key in a specific order?

I am trying to set up continuous data replication in Snowflake. I get the transactions happened in source system and I need to perform them in Snowflake in the same order as source system. I am trying to use MERGE for this, but when there are multiple operations on same key in source system, MERGE is not working correctly. It either misses an operation or returns duplicate row detected during DML operation error.
Please note that the transactions need to happen in exact order and it is not possible to take the latest transaction for a key and just do it (like if a record has been INSERTED and UPDATED, in Snowflake too it needs to be inserted first and then updated even though insert is only transient state) .
Here is the example:
create or replace table employee_source (
id int,
first_name varchar(255),
last_name varchar(255),
operation_name varchar(255),
binlogkey integer
)
create or replace table employee_destination ( id int, first_name varchar(255), last_name varchar(255) );
insert into employee_source values (1,'Wayne','Bells','INSERT',11);
insert into employee_source values (1,'Wayne','BellsT','UPDATE',12);
insert into employee_source values (2,'Anthony','Allen','INSERT',13);
insert into employee_source values (3,'Eric','Henderson','INSERT',14);
insert into employee_source values (4,'Jimmy','Smith','INSERT',15);
insert into employee_source values (1,'Wayne','Bellsa','UPDATE',16);
insert into employee_source values (1,'Wayner','Bellsat','UPDATE',17);
insert into employee_source values (2,'Anthony','Allen','DELETE',18);
MERGE into employee_destination as T using (select * from employee_source order by binlogkey)
AS S
ON T.id = s.id
when not matched
And S.operation_name = 'INSERT' THEN
INSERT (id,
first_name,
last_name)
VALUES (
S.id,
S.first_name,
S.last_name)
when matched AND S.operation_name = 'UPDATE'
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.operation_name = 'DELETE' THEN DELETE;
I am expecting to see - Bellsat - as last name for employee id 1 in the employee_destination table after all rows get processed. Same way, I should not see emp id 2 in the employee_destination table.
Is there any other alternative to MERGE to achieve this? Basically to go over every single DML in the same order (using binlogkey column for ordering) .
thanks.
You need to manipulate your source data to ensure that you only have one record per key/operation otherwise the join will be non-deterministic and will (dpending on your settings) either error or will update using a random one of the applicable source records. This is covered in the documentation here https://docs.snowflake.com/en/sql-reference/sql/merge.html#duplicate-join-behavior.
In any case, why would you want to update a record only for it to be overwritten by another update - this would be incredibly inefficient?
Since your updates appear to include the new values for all rows, you can use a window function to get to just the latest incoming change, and then merge those results into the target table. For example, the select for that merge (with the window function to get only the latest change) would look like this:
with SOURCE_DATA as
(
select COLUMN1::int ID
,COLUMN2::string FIRST_NAME
,COLUMN3::string LAST_NAME
,COLUMN4::string OPERATION_NAME
,COLUMN5::int PROCESSING_ORDER
from values
(1,'Wayne','Bells','INSERT',11),
(1,'Wayne','BellsT','UPDATE',12),
(2,'Anthony','Allen','INSERT',13),
(3,'Eric','Henderson','INSERT',14),
(4,'Jimmy','Smith','INSERT',15),
(1,'Wayne','Bellsa','UPDATE',16),
(1,'Wayne','Bellsat','UPDATE',17),
(2,'Anthony','Allen','DELETE',18)
)
select * from SOURCE_DATA
qualify row_number() over (partition by ID order by PROCESSING_ORDER desc) = 1
That will produce a result set that has only the changes required to merge into the target table:
ID
FIRST_NAME
LAST_NAME
OPERATION_NAME
PROCESSING_ORDER
1
Wayne
Bellsat
UPDATE
17
2
Anthony
Allen
DELETE
18
3
Eric
Henderson
INSERT
14
4
Jimmy
Smith
INSERT
15
You can then change the when not matched to remove the operation_name. If it's listed as an update and it's not in the target table, it's because it was inserted in a previous operation in the new changes.
For the when matched clause, you can use the operation_name to determine if the row should be updated or deleted.

Perform SCD2 on snowflake table based upon oracle input data

currently am sourcing data from oracle
As a part of intial load ingested all history data from oracle table oracle_a to snowflake table "snow_a" using named stage and copy into commands.
I would like to perform SCD2 on snow_a table based upon oracle_a table.
I mean if any new record added to Oracle_a table then that record to be inserted and any changes to existing record of oracle_a table ,
existing record of snow_a table to be expired and insert the record. Further details refer below image.
oracle_a table has key columns key_col1,key_col2,key_col3 as mentioned in below image. attr1 and attr2 are other attributes of the table enter image description here
Implementing SCD Type 2 functionality on a table in Snowflake is no different than in any other relational database. However, there is additional functionality that can help with this process. Please have a look at this blog post series on using Snowflake Streams and Tasks to perform the SCD logic.
https://www.snowflake.com/blog/building-a-type-2-slowly-changing-dimension-in-snowflake-using-streams-and-tasks-part-1/
Cheers,
Michael Rainey
Ok so here is what I found - though you may need to adjust were the update and insert come from - since oracle_a is not in Snowflake.
CREATE TABLE snowflake_a(key_col1 varchar(10), key_col2 varchar(10), key_col3 varchar(10), attr1 varchar(8), attr2 varchar(10), eff_ts TIMESTAMP, exp_ts TIMESTAMP, valid varchar(10));
DROP table oracle_a;
INSERT INTO snowflake_a VALUES('PT_1', 'DL_1', 'RPT_1', 'Address1', 'APT_1', current_date, current_date, 'Active');
CREATE TABLE oracle_a(key_col1 varchar(10), key_col2 varchar(10), key_col3 varchar(10), attr1 varchar(8), attr2 varchar(8), eff_ts TIMESTAMP, exp_ts TIMESTAMP);
INSERT INTO oracle_a
VALUES( 'PT_1', 'DL_1', 'RPT_1', 'Address1', 'APT_1', '10/24/2019', '12/31/1999');
UPDATE snowflake_a
SET valid = 'Expired'
WHERE valid LIKE '%Active%';
SELECT * FROM snowflake_a;
INSERT INTO snowflake_a VALUES( 'PT_1', 'DL_1', 'RPT_1', 'Address1', 'APT_1', '10/24/2019', '12/31/1999', 'Active');
SELECT * FROM snowflake_a;
Or better yet, what are us using to connect from your Oracle ecosystem to the Snowflake ecosystem?
From the question, it seems that the incoming Oracle rows do not contain any SCD2 type columns and that when each row inserted into snowflake is to be inserted using SCD2 type functionality.
SCD2 columns can have a specific meaning to the business, such that 'exp_ts' could be actual date or a business date. Snowflake 'Stage' does not include SCD2 functionality. This is usually the role of an ETL framework, not that of a 'fast/bulk' load utility.
Most ETL vendors have SCD2 functions as a part of their offering.
I did following steps to perform SCD2.
Loaded Oracle_a table data into TEMPORARY scd2_temp table
Performed update on snow_a to expire "changed records" by joining
key cols and checking the rest of attributes
Inserted into snow_a table from TEMPORARY scd2_temp to snow_a table
Here's a solution based on the following assumptions:
The source oracle table is not itself responsible for SCD2
processing (so Eff/Exp TS columns wouldn't be present on that
table).
There is an external process that is only Extracting/Loading delta
(new, updated) records into Snowflake.
The source oracle is not deleting records
First create the tables and add the first set of delta data:
CREATE OR REPLACE TABLE stg.cdc2_oracle_d (
key1 varchar(10),
key2 varchar(10),
key3 varchar(10),
attr1 varchar(8),
attr2 varchar(8));
CREATE OR REPLACE TABLE edw.cdc2_snowflake_d (
key1 varchar(10),
key2 varchar(10),
key3 varchar(10),
attr1 varchar(8),
attr2 varchar(8),
eff_ts TIMESTAMP_LTZ(0),
exp_ts TIMESTAMP_LTZ(0),
active_fl char(1));
INSERT INTO stg.cdc2_oracle_d VALUES
( 'PT_1', 'DL_1', 'RPT_1', 'Addr1a', 'APT_1.0'),
( 'PT_2', 'DL_2', 'RPT_2', 'Addr2a', 'APT_2.0'),
( 'PT_3', 'DL_3', 'RPT_3', 'Addr3a', 'APT_3.0');
Then run the following Transformation script:
BEGIN;
-- 1: insert new-new records from stg table that don't current exist in the edw table
INSERT INTO edw.cdc2_snowflake_d
SELECT
key1,
key2,
key3,
attr1,
attr2,
CURRENT_TIMESTAMP(0) AS eff_ts,
CAST('9999-12-31 23:59:59' AS TIMESTAMP) AS end_ts,
'Y' AS active_fl
FROM stg.cdc2_oracle_d stg
WHERE NOT EXISTS (
SELECT 1
FROM edw.cdc2_snowflake_d edw
WHERE edw.key1 = stg.key1
AND edw.key2 = stg.key2
AND edw.key3 = stg.key3
AND edw.active_fl = 'Y');
-- 2: insert new version of record from stg table where key current does exist in edw table
-- but only add if the attr columns are different, otherwise it's the same record
INSERT INTO edw.cdc2_snowflake_d
SELECT
stg.key1,
stg.key2,
stg.key3,
stg.attr1,
stg.attr2,
CURRENT_TIMESTAMP(0) AS eff_ts,
CAST('9999-12-31 23:59:59' AS TIMESTAMP) AS end_ts,
'T' AS active_fl -- set flat to Temporary setting
FROM stg.cdc2_oracle_d stg
JOIN edw.cdc2_snowflake_d edw ON edw.key1 = stg.key1 AND edw.key2 = stg.key2
AND edw.key3 = stg.key3 AND edw.active_fl = 'Y'
WHERE (stg.attr1 <> edw.attr1
OR stg.attr2 <> edw.attr2);
-- 3: deactive the current record where there is a new record from above step
-- and set the end_ts to 1 second prior to new record so there is no overlap in data
UPDATE edw.cdc2_snowflake_d old
SET old.active_fl = 'N',
old.exp_ts = DATEADD(SECOND, -1, new.eff_ts)
FROM edw.cdc2_snowflake_d new
WHERE old.key1 = new.key1
AND old.key2 = new.key2
AND old.key3 = new.key3
AND new.active_fl = 'T'
AND old.active_fl = 'Y';
-- 4: finally set all the temporary records to active
UPDATE cdc2_snowflake_d tmp
SET tmp.active_fl = 'Y'
WHERE tmp.active_fl = 'T';
COMMIT;
Review the results, then truncate & add new data and run the script again:
SELECT * FROM stg.cdc2_oracle_d;
SELECT * FROM edw.cdc2_snowflake_d ORDER BY 1,2,3,5;
TRUNCATE TABLE stg.cdc2_oracle_d;
INSERT INTO stg.cdc2_oracle_d VALUES
( 'PT_1', 'DL_1', 'RPT_1', 'Addr1a', 'APT_1.1'), -- record has updated attr2
( 'PT_2', 'DL_2', 'RPT_2', 'Addr2a', 'APT_2.0'), -- record has no changes
( 'PT_4', 'DL_4', 'RPT_4', 'Addr4a', 'APT_4.0'); -- new record
You'll see that PT_1 has 2 records w/ non-overlapping timestamps, only 1 is active.

TSQL Update Issue

Ok SQL Server fans I have an issue with a legacy stored procedure that sits inside of a SQL Server 2008 R2 Instance that I have inherited also with the PROD data which to say the least is horrible. Also, I can NOT make any changes to the data nor the table structures.
So here is my problem, the stored procedure in question runs daily and is used to update the employee table. As you can see from my example the incoming data (#New_Employees) contains the updated data and I need to use it to update the data in the Employee data is stored in the #Existing_Employees table. Throughout the years different formatting of the EMP_ID value has been used and must be maintained as is (I fought and lost that battle). Thankfully, I have been successfully in changing the format of the EMP_ID column in the #New_Employees table (Yeah!) and any new records will use this format thankfully!
So now you may see my problem, I need to update the ID column in the #New_Employees table with the corresponding ID from the #Existing_Employees table by matching (that's right you guessed it) by the EMP_ID columns. So I came up with an extremely hacky way to handle the disparate formats of the EMP_ID columns but it is very slow considering the number of rows that I need to process (1M+).
I thought of creating a staging table where I could simply cast the EMP_ID columns to an INT and then back to a NVARCHAR in each table to remove the leading zeros and I am sort of leaning that way but I wanted to see if there was another way to handle this dysfunctional data. Any constructive comments are welcome.
IF OBJECT_ID(N'TempDB..#NEW_EMPLOYEES') IS NOT NULL
DROP TABLE #NEW_EMPLOYEES
CREATE TABLE #NEW_EMPLOYEES(
ID INT
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
IF OBJECT_ID(N'TempDB..#EXISTING_EMPLOYEES') IS NOT NULL
DROP TABLE #EXISTING_EMPLOYEES
CREATE TABLE #EXISTING_EMPLOYEES(
ID INT PRIMARY KEY
,EMP_ID NVARCHAR(50)
,NAME NVARCHAR(50))
GO
INSERT INTO #NEW_EMPLOYEES
VALUES(NULL, '00123', 'Adam Arkin')
,(NULL, '00345', 'Bob Baker')
,(NULL, '00526', 'Charles Nelson O''Reilly')
,(NULL, '04321', 'David Numberman')
,(NULL, '44321', 'Ida Falcone')
INSERT INTO #EXISTING_EMPLOYEES
VALUES(1, '123', 'Adam Arkin')
,(2, '000345', 'Bob Baker')
,(3, '526', 'Charles Nelson O''Reilly')
,(4, '0004321', 'Ed Sullivan')
,(5, '02143', 'Frank Sinatra')
,(6, '5567', 'George Thorogood')
,(7, '0000123-1', 'Adam Arkin')
,(8, '7', 'Harry Hamilton')
-- First Method - Not Successful
UPDATE NE
SET ID = EE.ID
FROM
#NEW_EMPLOYEES NE
LEFT OUTER JOIN #EXISTING_EMPLOYEES EE
ON EE.EMP_ID = NE.EMP_ID
SELECT * FROM #NEW_EMPLOYEES
-- Second Method - Successful but Slow
UPDATE NE
SET ID = EE.ID
FROM
dbo.#NEW_EMPLOYEES NE
LEFT OUTER JOIN dbo.#EXISTING_EMPLOYEES EE
ON CAST(CASE WHEN NE.EMP_ID LIKE N'%[^0-9]%'
THEN NE.EMP_ID
ELSE LTRIM(STR(CAST(NE.EMP_ID AS INT))) END AS NVARCHAR(50)) =
CAST(CASE WHEN EE.EMP_ID LIKE N'%[^0-9]%'
THEN EE.EMP_ID
ELSE LTRIM(STR(CAST(EE.EMP_ID AS INT))) END AS NVARCHAR(50))
SELECT * FROM #NEW_EMPLOYEES
the number of rows that I need to process (1M+).
A million employees? Per day?
I think I would add a 3rd table:
create table #ids ( id INT not NULL PRIMARY KEY
, emp_id not NULL NVARCHAR(50) unique );
Populate that table using your LTRIM(STR(CAST, ahem, algorithm, and update Employees directly from a join of those three tables.
I recommend using ANSI update, not Microsoft's nonstandard update ... from because the ANSI version prevents nondeterministic results in cases where the FROM produces more than one row.

SEQUENCE in SQL Server 2008 R2

I need to know if there is any way to have a SEQUENCE or something like that, as we have in Oracle. The idea is to get one number and then use it as a key to save some records in a table. Each time we need to save data in that table, first we get the next number from the sequence and then we use the same to save some records. Is not an IDENTITY column.
For example:
[ID] [SEQUENCE ID] [Code] [Value]
1 1 A 232
2 1 B 454
3 1 C 565
Next time someone needs to add records, the next SEQUENCE ID should be 2, is there any way to do it? the sequence could be a guid for me as well.
As Guillelon points out, the best way to do this in SQL Server is with an identity column.
You can simply define a column as being identity. When a new row is inserted, the identity is automatically incremented.
The difference is that the identity is updated on every row, not just some rows. To be honest, think this is a much better approach. Your example suggests that you are storing both an entity and detail in the same table.
The SequenceId should be the primary identity key in another table. This value can then be used for insertion into this table.
This can be done using multiple ways, Following is what I can think of
Creating a trigger and there by computing the possible value
Adding a computed column along with a function that retrieves the next value of the sequence
Here is an article that presents various solutions
One possible way is to do something like this:
-- Example 1
DECLARE #Var INT
SET #Var = Select Max(ID) + 1 From tbl;
INSERT INTO tbl VALUES (#var,'Record 1')
INSERT INTO tbl VALUES (#var,'Record 2')
INSERT INTO tbl VALUES (#var,'Record 3')
-- Example 2
INSERT INTO #temp VALUES (1,2)
INSERT INTO #temp VALUES (1,2)
INSERT INTO ActualTable (col1, col2, sequence)
SELECT temp.*, (SELECT MAX(ID) + 1 FROM ActualTable)
FROM #temp temp
-- Example 3
DECLARE #var int
INSERT INTO ActualTable (col1, col2, sequence) OUTPUT #var = inserted.sequence VALUES (1, 2, (SELECT MAX(ID) + 1 FROM ActualTable))
The first two examples rely on batch updating. But based on your comment, I have added example 3 which is a single input initially. You can then use the sequence that was inserted to insert the rest of the records. If you have never used an output, please reply in comments and I will expand further.
I would isolate all of the above inside of a transactions.
If you were using SQL Server 2012, you could use the SEQUENCE operator as shown here.
Forgive me if syntax errors, don't have SSMS installed

Correlating multiple tables

I was trying to extend the above link http://blogs.msdn.com/b/biztalkcpr/archive/2009/10/05/inserting-parent-child-records-with-identity-column-using-wcf-sql-adapter-in-one-transaction.aspx#comments
My problem is :
I have three table....the first two tables have id as IDENTITY... I need to get the id of first table in 2nd table
Then I need to get id of 1st and 2nd table in 3rd table ..... I was able to get id of first table into 2nd table
But I'm not able to get id of 2nd table into 3rd table...........I am using WCF SQL adapter to consume the stored procedure and my stored procedure looks like this
CREATE Procedure [dbo].[InsertHeader]
(
#parHeader As Header READONLY,
#parHeader_Details As HeaderDetails READONLY,
#parHeader_Details1 As HeaderDetails1 READONLY
)
AS
SET NOCOUNT ON
BEGIN
DECLARE #id int, #id1 int
INSERT INTO EDI834_5010_Header([File_Name_EDI834], [ST01], [ST02], [ST03],
[BGN01__TransactionSetPurposeCode],
[BGN02__TransactionSetIdentifierCode],
[BGN03__TransactionSetCreationDate],
[BGN04__TransactionSetCreationTime],
[BGN08__ActionCode], [SE01], [SE02])
SELECT
[File_Name_EDI834], [ST01], [ST02], [ST03],
[BGN01__TransactionSetPurposeCode], [BGN02__TransactionSetIdentifierCode],
[BGN03__TransactionSetCreationDate], [BGN04__TransactionSetCreationTime],
[BGN08__ActionCode], [SE01], [SE02]
FROM #parHeader;
SET #id = ##IDENTITY;
INSERT INTO EDI834_5010_2000([Header_Id], [INS01__InsuredIndicator],
[INS02__IndividualRelationshipCode],
[INS03__MaintenanceTypeCode],
[INS04__MaintenanceReasonCode],
[INS05__BenefitStatusCode])
SELECT #id, [INS01__InsuredIndicator], [INS02__IndividualRelationshipCode],
[INS03__MaintenanceTypeCode], [INS04__MaintenanceReasonCode],
[INS05__BenefitStatusCode]
FROM #parHeader_Details;
SET #id1 = ##IDENTITY;
INSERT INTO EDI834_5010_2300Loop([Id_Header_Id], [Id_Loop2000],
[HD01_MaintenanceTypeCode], [HD03_InsuranceLineCode],
[HD04_PlanCoverageDescription])
SELECT #id, #id1,
HD01_MaintenanceTypeCode, HD03_InsuranceLineCode,
HD04_PlanCoverageDescription
FROM #parHeader_Details1;
RETURN #id1;
END
What do I need to change in my stored procedure to get the id of 2nd table into 3rd....... there are so many looping in xml so I need to get the appropriate ids in 3rd table
And my data looks like this
<Header details>
<Header_Details1> data </Header_Details>
<Header_Details1> data </Header_Details>
<Header_Details1> data </Header_Details>
<Header_details>
<Header details>
<Header_Details1> data </Header_Details>
<Header_Details1> data </Header_Details>
<Header_Details1> data </Header_Details>
<Header_details>
Well, your main problem is: you're not just inserting a single row - but a whole bunch of rows!
So while you can use ##IDENTITY to get the last inserted identity value (btw: I would recommend using SCOPE_IDENTITY() instead - see here as for why) - this will only work for a single row!
What you need is a mechanism to output multiple inserted identities - use the OUTPUT clause:
DECLARE #InsertedIDs TABLE (ID1 INT, ID2 INT)
INSERT INTO EDI834_5010_Header(......)
OUTPUT Inserted.ID INTO #InsertedIDs(ID1)
SELECT
.......
FROM #parHeader;
For the first statement, this will output all inserted identity values into the table variable #InsertedIDs.
Now for your second table - is there any column that you can relate the first and second inserted ID's by?? You would need to capture the inserted identity values from the second INSERT into the same table variable, but you need to somehow know which ID1 to associate with with ID2 - and I quite frankly don't see how that would be done in your statements.....
But in the end, you would have a table variable with n rows of (ID1, ID2) which you can then use to insert into your third table.

Resources