SQL Server Equivalent of Oracle 'CONNECT BY PRIOR', and 'ORDER SIBLINGS BY' - sql-server

I've got this Oracle code structure I'm trying to convert to SQL Server 2008 (Note: I have used generic names, enclosed column names and table names within square brackets '[]', and done some formatting to make the code more readable):
SELECT [col#1], [col#2], [col#3], ..., [col#n], [LEVEL]
FROM (SELECT [col#1], [col#2], [col#3], ..., [col#n]
FROM [TABLE_1]
WHERE ... )
CONNECT BY PRIOR [col#1] = [col#2]
START WITH [col#2] IS NULL
ORDER SIBLINGS BY [col#3]
What is the SQL Server equivalent template of the above code?
Specifically, I'm struggling with the LEVEL, and 'ORDER SIBLINGS BY' Oracle constructs.
Note:
The above "code" is the final output from a set of Oracle procedures. Basically, the 'WHERE' clause is built up dynamically and changes depending on various parameters passed. The code block starting with 'CONNECT BY PRIOR' is hard-coded.
For Reference:
The Simulation of CONNECT BY PRIOR of ORACLE in SQL SERVER article comes close, but it does not explain how to handle the 'LEVEL' and the 'ORDER SIBLINGS' constructs. ... And my mind is getting in a twist!
SELECT name
FROM emp
START WITH name = 'Joan'
CONNECT BY PRIOR empid = mgrid
equates to:
WITH n(empid, name) AS
(SELECT empid, name
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT nplus1.empid, nplus1.name
FROM emp as nplus1, n
WHERE n.empid = nplus1.mgrid)
SELECT name FROM n
If I have an initial template to work from, it will go a long way to helping me construct SQL Server stored procs to build up a correct T-SQL statement.
Assistance will be much appreciated.

Simulating the LEVEL column
The level column can easily be simulated by incrementing a counter in the recursive part:
WITH tree (empid, name, level) AS (
SELECT empid, name, 1 as level
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT child.empid, child.name, parent.level + 1
FROM emp as child
JOIN tree parent on parent.empid = child.mgrid
)
SELECT name
FROM tree;
Simulating order siblings by
Simulating the order siblings by is a bit more complicated. Assuming we have a column sort_order that defines the order of elements per parent (not the overall sort order - because then order siblings wouldn't be necessary) then we can create a column which gives us an overall sort order:
WITH tree (empid, name, level, sort_path) AS (
SELECT empid, name, 1 as level,
cast('/' + right('000000' + CONVERT(varchar, sort_order), 6) as varchar(max))
FROM emp
WHERE name = 'Joan'
UNION ALL
SELECT child.empid, child.name, parent.level + 1,
parent.sort_path + '/' + right('000000' + CONVERT(varchar, child.sort_order), 6)
FROM emp as child
JOIN tree parent on parent.empid = child.mgrid
)
SELECT *
FROM tree
order by sort_path;
The expression for the sort_path looks so complicated because SQL Server (at least the version you are using) does not have a simple function to format a number with leading zeros. In Postgres I would use an integer array so that the conversion to varchar isn't necessary - but that doesn't work in SQL Server either.

The option given by the user "a_horse_with_no_name" worked for me. I changed the code and applied it to a menu generator query and it worked the first time. Here is the code:
WITH tree(option_id,
option_description,
option_url,
option_icon,
option_level,
sort_path)
AS (
SELECT ppo.option_id,
ppo.option_description,
ppo.option_url,
ppo.option_icon,
1 AS option_level,
CAST('/' + RIGHT('00' + CONVERT(VARCHAR, ppo.option_index), 6) AS VARCHAR(MAX))
FROM security.options_table_name ppo
WHERE ppo.option_parent_id IS NULL
UNION ALL
SELECT co.option_id,
co.option_description,
co.option_url,
co.option_icon,
po.option_level + 1,
po.sort_path + '/' + RIGHT('00' + CONVERT(VARCHAR, co.option_index), 6)
FROM security.options_table_name co,
tree AS po
WHERE po.option_id = co.option_parent_id)
SELECT *
FROM tree
ORDER BY sort_path;

to get dates for last 10 days:
SELECT DISTINCT RecordDate = DATEADD(DAY,-number,CAST(GETDATE() AS DATE))
FROM master..[spt_values]
WHERE number BETWEEN 1 AND 10

Related

Expression to find multiple spaces in string

We handle a lot of sensitive data and I would like to mask passenger names using only the first and last letter of each name part and join these by three asterisks (***),
For example: the name 'John Doe' will become 'J***n D***e'
For a name that consists of two parts this is doable by finding the space using the expression:
LEFT(CardHolderNameFromPurchase, 1) +
'***' +
CASE WHEN CHARINDEX(' ', PassengerName) = 0
THEN RIGHT(PassengerName, 1)
ELSE SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) -1, 1) +
' ' +
SUBSTRING(PassengerName, CHARINDEX(' ', PassengerName) +1, 1) +
'***' +
RIGHT(PassengerName, 1)
END
However, the passenger name can have more than two parts, there is no real limit to it. How should can I find the indices of all spaces within an expression? Or should I maybe tackle this problem in a different way?
Any help or pointer is much appreciated!
This solution does what you want it to, but is really the wrong approach to use when trying to hide personally identifiable data, as per Gordon's explanation in his answer.
SQL:
declare #t table(n nvarchar(20));
insert into #t values('John Doe')
,('JohnDoe')
,('John Doe Two')
,('John Doe Two Three')
,('John O''Neill');
select n
,stuff((select ' ' + left(s.item,1) + '***' + right(s.item,1)
from dbo.fn_StringSplit4k(t.n,' ',null) as s
for xml path('')
),1,1,''
) as mask
from #t as t;
Output:
+--------------------+-------------------------+
| n | mask |
+--------------------+-------------------------+
| John Doe | J***n D***e |
| JohnDoe | J***e |
| John Doe Two | J***n D***e T***o |
| John Doe Two Three | J***n D***e T***o T***e |
| John O'Neill | J***n O***l |
+--------------------+-------------------------+
String splitting function based on Jeff Moden's Tally Table approach:
create function [dbo].[fn_StringSplit4k]
(
#str nvarchar(4000) = ' ' -- String to split.
,#delimiter as nvarchar(1) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return, null returns all.
)
returns table
as
return
-- Start tally table with 10 rows.
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(isnull(#str,'')) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that follows the specified delimiter.
,s(s) as (select 1 union all select t+1 from t where substring(isnull(#str,''),t,1) = #delimiter)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
,l(s,l) as (select s,isnull(nullif(charindex(#delimiter,isnull(#str,''),s),0)-s,4000) from s)
select rn
,item
from(select row_number() over(order by s) as rn
,substring(#str,s,l) as item
from l
) a
where rn = #num
or #num is null;
GO
If you consider PassengerName as sensitive information, then you should not be storing it in clear text in generally accessible tables. Period.
There are several different options.
One is to have reference tables for sensitive information. Any table that references this would have an id rather than the name. Viola. No sensitive information is available without access to the reference table, and that would be severely restricted.
A second method is a reversible compression algorithm. This would allow the the value to be gibberish, but with the right knowledge, it could be transformed back into a meaningful value. Typical methods for this are the public key encryption algorithms devised by Rivest, Shamir, and Adelman (RSA encoding).
If you want to do first and last letters of names, I would be really careful about Asian names. Many of them consist of two or three letters, when written in Latin script. That isn't much hiding. SQL Server does not have simple mechanisms to do this. You can write a user-defined function with a loop to manager the process. However, I view this as the least secure and least desirable approach.
This uses Jeff Moden's DelimitedSplit8K, as well as the new functionality in SQL Server 2017 STRING_AGG. As I don't know what version you're using, I've just gone "whole hog" and assumed you're using the latest version.
Jeff's function is invaluable here, as it returns the ordinal position, something which Microsoft have foolishly omitted from their own function, STRING_SPLIT (and didn't add in 2017 either). Ordinal position is key here, so we can't make use of the built in function.
WITH VTE AS(
SELECT *
FROM (VALUES ('John Doe'),('Jane Bloggs'),('Edgar Allan Poe'),('Mr George W. Bush'),('Homer J Simpson')) V(FullName)),
Masking AS (
SELECT *,
ISNULL(STUFF(Item, 2, LEN(item) -2,'***'), Item) AS MaskedPart
FROM VTE V
CROSS APPLY dbo.delimitedSplit8K(V.Fullname, ' '))
SELECT STRING_AGG(MaskedPart,' ') AS MaskedFullName
FROM Masking
GROUP BY Fullname;
Edit: Nevermind, OP has commented they are using 2008, so STRING_AGG is out of the question. #iamdave, however, has posted an answer which is very similar to my own, just do it the "old fashioned XML way".
Depending on your version of SQL Server, you may be able to use the built-in string split to rows on spaces in the name, do your string formatting, and then roll back up to name level using an XML path.
create table dataset (id int identity(1,1), name varchar(50));
insert into dataset (name) values
('John Smith'),
('Edgar Allen Poe'),
('One Two Three Four');
with split as (
select id, cs.Value as Name
from dataset
cross apply STRING_SPLIT (name, ' ') cs
),
formatted as (
select
id,
name,
left(name, 1) + '***' + right(name, 1) as out
from split
)
SELECT
id,
(SELECT ' ' + out
FROM formatted b
WHERE a.id = b.id
FOR XML PATH('')) [out_name]
FROM formatted a
GROUP BY id
Result:
id out_name
1 J***n S***h
2 E***r A***n P***e
3 O***e T***o T***e F***r
You can do that using this function.
create function [dbo].[fnMaskName] (#var_name varchar(100))
RETURNS varchar(100)
WITH EXECUTE AS CALLER
AS
BEGIN
declare #var_part varchar(100)
declare #var_return varchar(100)
declare #n_position smallint
set #var_return = ''
set #n_position = 1
WHILE #n_position<>0
BEGIN
SET #n_position = CHARINDEX(' ', #var_name)
IF #n_position = 0
SET #n_position = LEN(#var_name)
SET #var_part = SUBSTRING(#var_name, 1, #n_position)
SET #var_name = SUBSTRING(#var_name, #n_position+1, LEN(#var_name))
if #var_part<>''
SET #var_return = #var_return + stuff(#var_part, 2, len(#var_part)-2, replicate('*',len(#var_part)-2)) + ' '
END
RETURN(#var_return)
END

Is ordering done by actual column or alias?

Let's say I have a simple query like this:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by grade
It seems that order by grade is being ordered by the alias grade instead of the actual column grade; at least that's what the result shows.
Is this correct?
Since I can't change the columns that are included in the result, is the solution to add an alias to the actual column? Something like this?
select
grade as 'grade2',
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup,grade
order by grade2
If you prefix the column name by its table name (or an alias given to the table in the FROM clause) in the ORDER BY clause, then it will use the column, not the expression computed in the SELECT clause and given the same name as the column.
So this should sort using the original grade column:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by table_empl.grade
Or:
select
subgroup,
subgroup + ' (' + cast(grade as varchar(1)) + 'G)' as grade,
count(*) as 'count'
From table_empl t
where year(EnterDate) = year(getdate())
group by subgroup, grade
order by t.grade
Instruction Order By runs after all instructions, even Select. And in this case it's correct to take alias instead actual column.
The clauses are processed in the following order:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You can use name(Alias) of table to specify table column
A very good question. Apparently, the official documentation does not provide a direct answer to it. However, one can imply the observed behaviour from the following fact: the difference between column alias and column is that the latter can be prefixed with its parent table name (alias), whereas the former cannot.
Since you didn't specify the table name in the ORDER BY clause, the column alias takes root.

Adding number to text rows sql server

I have a columns named id and item and there are stored values like:
id item
1 value
2 value
3 value
etc. There are 192 rows. These values are in the system in different places and I need to find concrete value in database to change it to the name I need.
Is there some posibility to add number to rows, for example value_01, value_02 etc.
I know how to do it in C language, but have no idea how to do it in sql server.
Edited:
#lad2025
In system I have columns, that names are stored in database.
Names are same, for example:
In app Apple I have table name Apple
In app Storage I also have table name Apple
I need to change app Storage columns name Apple to different, but I dont know, which of databasa Apple values it is, so I want to add identifiers to string, to find the right one. So I need to update database values, to see them in system.
SQLFiddleDemo
DECLARE #pad INT = 3;
SELECT
[id],
[item] = [item] + '_' + RIGHT(REPLICATE('0', #pad) + CAST([id] AS NVARCHAR(10)), #pad)
FROM your_table;
This will produce result like:
value_001
value_010
value_192
EDIT:
After reading your comments it is not clear what you want to achieve, but check:
SqlFiddleDemo2
DECLARE #pad INT = 3;
;WITH cte AS
(
SELECT *,
[rn] = ROW_NUMBER() OVER (PARTITION BY item ORDER BY item)
FROM your_table
)
SELECT
[id],
[item] = [item] + '_' + RIGHT(REPLICATE('0', #pad) + CAST([rn] AS NVARCHAR(10)), #pad)
FROM cte
WHERE item = 'value'; /* You can comment it if needed */

How does a recursive CTE eliminate duplicates?

I'm learning recursive CTEs in the AdventureWorks2012 database using SQL Server 2014 Express. I think I'm mostly getting the below example (taking from Beginning T-SQL 3rd Edition), but I don't quite understand why the recursive CTE doesn't produce duplicates.
Below is the recursive CTE that I'm trying to understand, it's a standard employee - manager hierarchy.
;with orgchart (employeeid, managerid, title, level, node) as (
--Anchor
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
from employee
where managerid is null
union all
--Recursive
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join orgchart oc on oc.employeeid = emp.managerid
)
select employeeid
, managerid
, space(level * 3) + title 'title'
, level
, node
from orgchart
order by node;
It works fine, but the question comes when I try to understand what's going on by recreating it via temp tables. I create a series of temp tables to plug one output into the next query's input and recreate what the recursive CTE does.
--Anchor (Level 0)
select employeeid
, managerid
, title
, 0
, convert(varchar(30),'/') 'node'
into #orgchart
from employee
where managerid is null
Then I use that temp table to recreate the first level of recursion, at this point it's just the recursive CTE but with temp tables.
--Anchor + 1 level
select *
into #orgchart2
from #orgchart
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart oc on oc.employeeid = emp.managerid
So far so good, the results make sense. Then I do it one more time, but here's where it starts to break down:
--Anchor + 2 levels
select *
into #orgchart3
from #orgchart2
union all
select emp.employeeid
, emp.managerid
, emp.title
, oc.level + 1
, convert(varchar(30), oc.node + convert(varchar(30),emp.managerid) + '/')
from employee emp
inner join #orgchart2 oc on oc.employeeid = emp.managerid
The output from this begins to return duplicate rows (all fields duplicate) of the level 1 employees. This makes sense - the second query after the UNION ALL will return the previous levels as well as the new level of recursion, and UNION ALL doesn't duplicate. If I do another round of recursion, the level 2 employees are also duplicated, and so on.
I understand that I can change UNION ALL to UNION in order to remove duplicates, but I'm trying to understand why the recursive CTE doesn't produce duplicates as well? It uses UNION ALL so I don't understand where the deduplication comes in. Is removal of duplicates an intrinsic part of a recursive CTE?
I'm trying to post all the result sets, but if they're needed to understand the problem then let me know and I will post them. Thanks in advance.
The difference is that when you populate your #orgchart2, you are including all the rows from #orgchart. So now when you create #orgchart3 (which represents a 3rd level of recursion), you are joining on the rows from #orgchart as well as #orgchart2.
So when you create the third level in #orgchart3, it is related to rows in both #orgchart and #orgchart2, when it should only be related to #orgchart2. Instead your third level includes rows that are one level beyond the 2nd level, but also one level beyond the anchor level, so you are duplicating rows, since you already have rows in the second level that are one level beyond the anchor level.
The optimizer knows not to do that with recursive CTEs. Each level of recursion only looks at the previous one and ignores all the ones that came before it. So no duplicates are created.
You would simulate what the optimizer does if you left out the top half of the UNION ALL when you populated #orgchart2 and #orgchart3, and then finally produced a single UNION ALL of all three temp tables.

SQL Server - Insert into with select and union - duplicates being inserted

When I execute my "select union select", I get the correct number or rows (156)
Executed independently, select #1 returns 65 rows and select #2 returns 138 rows.
When I use this "select union select" with an Insert into, I get 203 rows (65+138) with duplicates.
I would like to know if it is my code structure that is causing this issue ?
INSERT INTO dpapm_MediaObjectValidation (mediaobject_id, username, checked_date, expiration_date, notified)
(SELECT FKMediaObjectId, CreatedBy,#checkdate,dateadd(ww,2,#checkdate),0
FROM dbo.gs_MediaObjectMetadata
LEFT JOIN gs_MediaObject mo
ON gs_MediaObjectMetadata.FKMediaObjectId = mo.MediaObjectId
WHERE UPPER([Description]) IN ('CAPTION','TITLE','AUTHOR','DATE PHOTO TAKEN','KEYWORDS')
AND FKMediaObjectId >=
(SELECT TOP 1 MediaObjectId
FROM dbo.gs_MediaObject
WHERE DateAdded > #lastcheck
ORDER BY MediaObjectId)
GROUP BY FKMediaObjectId, CreatedBy
HAVING count(*) < 5
UNION
SELECT FKMediaObjectId, CreatedBy,getdate(),dateadd(ww,2,getdate()),0
FROM gs_MediaObjectMetadata yt
LEFT JOIN gs_MediaObject mo
ON yt.FKMediaObjectId = mo.MediaObjectId
WHERE UPPER([Description]) = 'KEYWORDS'
AND FKMediaObjectId >=
(SELECT TOP 1 MediaObjectId
FROM dbo.gs_MediaObject
WHERE DateAdded > #lastcheck
ORDER BY MediaObjectId)
AND NOT EXISTS
(
SELECT *
FROM dbo.fnSplit(Replace(yt.Value, '''', ''''''), ',') split
WHERE split.item in (SELECT KeywordEn FROM gs_Keywords) or split.item in (SELECT KeywordFr FROM gs_Keywords)
)
)
I would appreciate any clues into resolving this problem ...
Thank you !
The UNION keyword should only return distinct records between the two queries. However, if I recall correctly, this is only true if the datatypes are the same. The date variables might be throwing that off. Depending on the collation type, whitespace might be handled differently as well. You might want to do a SELECT DISTINCT on the dpapm_MediaObjectValidation table after doing your insert, but be sure to trim whitespace from both sides in your comparison. Another approach is to do your first insert, then on your second insert, forgo the UNION altogether and do a manual EXISTS check to see if the items to be inserted already exist.

Resources