T-SQL for Updating Rows with same value in a column - sql-server

I have a table lets say called FavoriteFruits that has NAME, FRUIT, and GUID for columns. The table is already populated with names and fruits. So lets say:
NAME FRUIT GUID
John Apple NULL
John Orange NULL
John Grapes NULL
Peter Canteloupe NULL
Peter Grapefruit NULL
Ok, now I want to update the GUID column with a new GUID (using NEWID()), but I want to have the same GUID per distinct name. So I want all the John Smiths to have the same GUID, and I want both the Peters to have the same GUID, but that GUID different than the one used for the Johns. So now it would look something like this:
NAME FRUIT GUID
John Apple f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
John Orange f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
John Grapes f6172268-78b7-4c2b-8cd7-7a5ca20f6a01
Peter Canteloupe e3b1851c-1927-491a-803e-6b3bce9bf223
Peter Grapefruit e3b1851c-1927-491a-803e-6b3bce9bf223
Can I do that in an update statement without having to use a cursor? If so can you please give an example?
Thanks guys...

Update a CTE won't work because it'll evaluate per row. A table variable would work:
You should be able to use a table variable as a source from which to update the data. This is untested, but it'll look something like:
DECLARE #n TABLE (Name varchar(10), Guid uniqueidentifier);
INSERT #n
SELECT Name, newid() AS Guid
FROM FavoriteFruits
GROUP BY Name;
UPDATE f
SET f.Guid = n.Guid
FROM #n n
JOIN FavoriteFruits f ON f.Name = n.Name
So that populates a variable with a GUID per name, then joins it back to the original table and updates accordingly.

To clarify comments re a table expression in the USING clause of a MERGE statement.
The following won't work because it'll evaluate per row:
MERGE INTO FavoriteFruits
USING (
SELECT NAME, NEWID() AS GUID
FROM FavoriteFruits
GROUP
BY NAME
) AS source
ON source.NAME = FavoriteFruits.NAME
WHEN MATCHED THEN
UPDATE
SET GUID = source.GUID;
But the following, using a table variable, will work:
DECLARE #n TABLE
(
NAME VARCHAR(10) NOT NULL UNIQUE,
GUID UNIQUEIDENTIFIER NOT NULL UNIQUE
);
INSERT INTO #n (NAME, GUID)
SELECT NAME, NEWID()
FROM FavoriteFruits
GROUP
BY NAME;
MERGE INTO FavoriteFruits
USING #n AS source
ON source.NAME = FavoriteFruits.NAME
WHEN MATCHED THEN
UPDATE
SET GUID = source.GUID;

There's a single-statement solution too, which, however, has some limitations. The idea is to use OPENQUERY(), like this:
UPDATE FavoriteFruits
SET GUID = n.GUID
FROM (
SELECT NAME, GUID
FROM OPENQUERY(
linkedserver,
'SELECT NAME, NEWID() AS GUID FROM database.schema.FavoriteFruits GROUP BY NAME'
)
) n
WHERE FavoriteFruits.NAME = n.NAME
This solution implies that you need to create a self-pointing linked server. Another specificity is that you can't use this method on table variables nor local temporary tables (global ones would do as well as 'normal' tables).

Related

Snowflake - how to do multiple DML operations on same primary key in a specific order?

I am trying to set up continuous data replication in Snowflake. I get the transactions happened in source system and I need to perform them in Snowflake in the same order as source system. I am trying to use MERGE for this, but when there are multiple operations on same key in source system, MERGE is not working correctly. It either misses an operation or returns duplicate row detected during DML operation error.
Please note that the transactions need to happen in exact order and it is not possible to take the latest transaction for a key and just do it (like if a record has been INSERTED and UPDATED, in Snowflake too it needs to be inserted first and then updated even though insert is only transient state) .
Here is the example:
create or replace table employee_source (
id int,
first_name varchar(255),
last_name varchar(255),
operation_name varchar(255),
binlogkey integer
)
create or replace table employee_destination ( id int, first_name varchar(255), last_name varchar(255) );
insert into employee_source values (1,'Wayne','Bells','INSERT',11);
insert into employee_source values (1,'Wayne','BellsT','UPDATE',12);
insert into employee_source values (2,'Anthony','Allen','INSERT',13);
insert into employee_source values (3,'Eric','Henderson','INSERT',14);
insert into employee_source values (4,'Jimmy','Smith','INSERT',15);
insert into employee_source values (1,'Wayne','Bellsa','UPDATE',16);
insert into employee_source values (1,'Wayner','Bellsat','UPDATE',17);
insert into employee_source values (2,'Anthony','Allen','DELETE',18);
MERGE into employee_destination as T using (select * from employee_source order by binlogkey)
AS S
ON T.id = s.id
when not matched
And S.operation_name = 'INSERT' THEN
INSERT (id,
first_name,
last_name)
VALUES (
S.id,
S.first_name,
S.last_name)
when matched AND S.operation_name = 'UPDATE'
THEN
update set T.first_name = S.first_name, T.last_name = S.last_name
When matched
And S.operation_name = 'DELETE' THEN DELETE;
I am expecting to see - Bellsat - as last name for employee id 1 in the employee_destination table after all rows get processed. Same way, I should not see emp id 2 in the employee_destination table.
Is there any other alternative to MERGE to achieve this? Basically to go over every single DML in the same order (using binlogkey column for ordering) .
thanks.
You need to manipulate your source data to ensure that you only have one record per key/operation otherwise the join will be non-deterministic and will (dpending on your settings) either error or will update using a random one of the applicable source records. This is covered in the documentation here https://docs.snowflake.com/en/sql-reference/sql/merge.html#duplicate-join-behavior.
In any case, why would you want to update a record only for it to be overwritten by another update - this would be incredibly inefficient?
Since your updates appear to include the new values for all rows, you can use a window function to get to just the latest incoming change, and then merge those results into the target table. For example, the select for that merge (with the window function to get only the latest change) would look like this:
with SOURCE_DATA as
(
select COLUMN1::int ID
,COLUMN2::string FIRST_NAME
,COLUMN3::string LAST_NAME
,COLUMN4::string OPERATION_NAME
,COLUMN5::int PROCESSING_ORDER
from values
(1,'Wayne','Bells','INSERT',11),
(1,'Wayne','BellsT','UPDATE',12),
(2,'Anthony','Allen','INSERT',13),
(3,'Eric','Henderson','INSERT',14),
(4,'Jimmy','Smith','INSERT',15),
(1,'Wayne','Bellsa','UPDATE',16),
(1,'Wayne','Bellsat','UPDATE',17),
(2,'Anthony','Allen','DELETE',18)
)
select * from SOURCE_DATA
qualify row_number() over (partition by ID order by PROCESSING_ORDER desc) = 1
That will produce a result set that has only the changes required to merge into the target table:
ID
FIRST_NAME
LAST_NAME
OPERATION_NAME
PROCESSING_ORDER
1
Wayne
Bellsat
UPDATE
17
2
Anthony
Allen
DELETE
18
3
Eric
Henderson
INSERT
14
4
Jimmy
Smith
INSERT
15
You can then change the when not matched to remove the operation_name. If it's listed as an update and it's not in the target table, it's because it was inserted in a previous operation in the new changes.
For the when matched clause, you can use the operation_name to determine if the row should be updated or deleted.

Creating a unique ID using first 3 character of last name and a sequence number

I have this employee table which I want every employee to have a unique ID using the 3 first letters of their name plus a sequence number in SQL Server.
I don't remember at all how to do this I haven't used SQL in a year and kinda forgot everything.
Can anyone refresh my mind on how to do this. Google has been of no help on this matter. Thanks
Firstly, I suggest that the numeric portion of your identifier be unique in and of itself, in case the employee gets married and changes their last name. The prefix can still appear to the left of it, but should not be necessary to be unique.
If you agree with this design, then you can simply use a numeric identity column on the Employee table and combine that with the last name when retrieving the data, using a computed column. I suggest you seed the identity with a value that has enough digits to keep your identifier lengths consistent, so for example to support 90,000 employees you can use a seed of 10,000 which ensures all identifiers are 8 characters long (three letters of the name plus five numeric).
Simple example:
CREATE TABLE Employee
(
EmployeeNo int IDENTITY(10000,1) PRIMARY KEY,
LastName VarChar(64),
EmployeeID AS SUBSTRING(UPPER(LastName), 1, 3) + RIGHT('0000' + CONVERT(char(5), EmployeeNo), 5)
)
INSERT Employee (LastName) VALUES ('Smith')
SELECT * FROM Employee
Results:
EmployeeNo LastName EmployeeID
10000 Smith SMI10000
For the purposes of your SQL and table design, your tables should all use EmployeeNo as foreign key, since it is compact and unique. Apply the three-letter prefix during data retrieval and only for customer-facing purposes.
#John Wu is right. However, if you don't want to rely on the Employee no then you use NewID() function, which will create a unique number always. Below is the code.
CREATE TABLE EmployeeDetails
(
EmployeeCode int IDENTITY(1,1),
FirstName varchar(50),
LastName Varchar(50),
Empid as left(Lastname,3) + convert(varchar(500), newid())
)
INSERT EmployeeDetails VALUES ('Atul', 'Jain')
`

SQL Script: Updating a column with another table pivoting on an ID

I have two SQL Server tables: ORDERS and DELIVERIES.
I would like to update the ORDERS table with a value from DELIVERIES. The ORDERS PK (OrderID) is common to both tables. Also, I would like to restrict the action to a specific CustomerID (within ORDERS).
ORDERS table:
OrderID | AccountID | AnalysisField1
DELIVERIES table:
DeliveryID | OrderID | AddressName
I want to update ORDERS.AnalysisField1 with the value from DELIVERIES.AddressName (linked by OrderID) but only where ORDERS.AccountID = '12345'
Please help. JM
Then try to use something like this:
UPDATE dbo.Orders
SET AnalysisField1 = d.Addressname
FROM dbo.Deliveries d
WHERE
d.OrderID = dbo.Orders.OrderID
AND dbo.Orders.AccountID = '12345'
If your AccountID column is of a numerical type (which the ID suffix would suggest), then you should not put unnecessary single quotes around the value in the WHERE clause:
AND dbo.Orders.AccountID = 12345

How to customise ordering of results (i.e. something other than alphabetical order for strings)

I have a table "Category" in sql server2008. It has 2 columns -ID,Name. I have inserted 3 Name in it as:
1.Case Report
2.Original Article
3.Letter to Author
4.Submmited Article
I have used following query to show table data:
select * from Category order by Name desc
it is showing result as:
4.Submmited Article
2.Original Article
3.Letter to Author
1.Case Report
But I want to show table value as:
2.Original Article
1.Case Report
3.Letter to Author
4.Submmited Article
please help me someone.
SELECT *
FROM Category
ORDER BY CASE
WHEN NAME = 'Original Article' THEN 1
WHEN NAME = 'Case Report' THEN 2
WHEN NAME = 'Letter to Author' THEN 3
END ASC
Given that you do not want to use an alphabetic sorting, I would suggest adding another column to perform the ordering by. For example, create a table like this:
CREATE TABLE MY_TABLE (
ID int PRIMARY KEY,
NAME varchar(50),
SORT_ORDER int
)
You would populate the SORT_ORDER column data to match the ordering you need and can then get the sorted data with:
SELECT * FROM MY_TABLE ORDER BY SORT_ORDER
Hope that helps.
It looks like you are trying to do a custom sort, in that case you can add third column eg SortID int, and then order the result set by the SortID. eg
SELECT ID, Name
FROM TableName
ORDER BY SortID

SQL Server: complex operation unique code for groups of rows

I have a table where most of the columns are filled with data. In that table I have rows where for 4 columns I have duplicate data. I wanna do something like this:
Check all rows, group all data by these four columns (one group = one unique data for these 4 columns) and sign to them UniqueCode in last column. UniqueCode are for group.
So, when I have something like this:
Name Street HouseNumber PostCode UniqueCode
Cos Cos Cos Cos
Cos Cos Cos Cos
I want to fill UniqueCode with this same code. I want to write a query which clears all current uniqueCode and fills with new and write trigger (I call it right?) which do the same for newly added rows.
It is possible write that behaviour in sql?
Or I need to do it in code in my program?
Can you help me?
Sorry for my bad English.
Assuming you have a table similar to this one:
create table Persons
(
ID int identity primary key,
Name varchar(100),
Street varchar(100),
HouseNumber int,
PostCode varchar(100),
UniqueCode uniqueidentifier
)
You would create a trigger that finds already entered Persons having the same composite key:
create trigger AssignPersonGroup on Persons
after insert, update
as
set nocount on
update Persons
set UniqueCode =
isnull(
(select top 1 UniqueCode from Persons
where UniqueCode is not null
and Inserted.Name = Persons.Name
and Inserted.Street = Persons.Street
and Inserted.HouseNumber = Persons.HouseNumber
and Inserted.PostCode = Persons.PostCode)
, newid())
from Inserted inner join Persons
on Inserted.ID = Persons.ID
And assuming this would be the data:
insert into persons (Name, Street, HouseNumber, PostCode)
values ('Zagor', 'Darkwood', 23, '01010')
insert into persons (Name, Street, HouseNumber, PostCode)
values ('Zagor', 'Darkwood', 23, '01010')
insert into persons (Name, Street, HouseNumber, PostCode)
values ('Chico', 'Darkwood', 23, '01010')
Then
select *
from Persons
Would deliver:
ID Name Street HouseNumber PostCode UniqueCode
1 Zagor Darkwood 23 01010 A113D12D-F730-42DD-B3EE-AC33E34C0679
2 Zagor Darkwood 23 01010 A113D12D-F730-42DD-B3EE-AC33E34C0679
3 Chico Darkwood 23 01010 FD0739AF-525C-42C2-B929-0AB8EEAC3A73
Your existing data would be updated if you execute following two updates:
update persons
set uniquecode = newid()
update Persons
set uniqueCode=
(
select top 1 uniqueCode
from Persons groups
where UniqueCode is not null
and groups.Name = Persons.Name
and groups.Street = Persons.Street
and groups.HouseNumber = Persons.HouseNumber
and groups.PostCode = Persons.PostCode
order by ID
)
They are separated because Sql Server executes joins before scalar expressions. My first take on this was update query with derived table grouping on all columns and adding newid() column, but as inner join to persons table was executed first ids were unique per each row again.
Disable trigger prior to doing this and reenable it later. uniqueCode should not be set in program, only in trigger. You would need composite index on (Name, Street, HouseNumber, PostCode) and another one on UniqueCode.
That aside, I have seen your other question concerning this project. What are you trying to do? I'm unsure about nature of relationship between primary and high school tables. If you want to connect them by certain person, then you should have a person table with identity primary key. Both tables would reference this one and you would have no problems identifing someone's academic path :-)

Resources