ORDER BY not behaving as expected - sql-server

Good day.
First i have query:
;
WITH ranked AS (
SELECT
p.id_price as p_id_price,
p.id_service as p_id_service,
p.name as p_name,
p.name_original as p_name_original,
p.id_producer_country as p_id_producer_country,
p.id_firm as p_id_firm,
f.name as f_name,
f.address as f_address,
f.phone as f_phone,
city.name as city_name,
pc.name as pc_name,
ROW_NUMBER() OVER (
PARTITION BY p.id_firm
ORDER BY
CASE -- this criterion puts matching products before non-matching ones
WHEN p.name like '%test%' COLLATE SQL_Latin1_General_Cp1251_CI_AS
THEN 1 ELSE 2
END,
p.id_price -- you may use any sorting criteria at this point,
-- just ensure it makes the results predictable
) AS rnk
FROM Price p
left join Firm f
on f.id_service=p.id_service
AND f.id_city=p.id_city
AND f.id_firm=p.id_firm
left join City city
on city.id_city = p.id_city
left join Producer_country pc
on pc.id_producer_country = p.id_producer_country
WHERE p.id_city='73041'
AND p.include='1'
AND p.blocked='0'
AND f.blocked='0'
AND ( f.name like '%test%' COLLATE SQL_Latin1_General_Cp1251_CI_AS
OR p.name like '%test%' COLLATE SQL_Latin1_General_Cp1251_CI_AS )
)
SELECT *
FROM ranked
WHERE rnk = 1
ORDER BY CASE WHEN f_name LIKE '%$..' THEN 0 ELSE 1 END,
f_name
;
The ORDER BY did not work:
Structure tables here
I need to f_name sort in ascending numbers after the $ sign
Why didn't the ORDER BY behave as expected?

f.name is only valid on the WITH clause. When you want to order the records again, use the alias provided,
CASE WHEN f_name LIKE '%$..' THEN 0 ELSE 1 END

After discussing further, realized that we need to sort the same string in two ways. First if it has a $ character at all then second by the string present after a $ character. Given that the ORDER BY would be:
ORDER BY
CASE WHEN f_name LIKE '%$%' THEN 0 ELSE 1 END,
SUBSTRING (name ,CHARINDEX('$' , name)+1, (LEN(name)) - CHARINDEX('$', name) ASC

Related

Getting non-deterministic results from WITH RECURSIVE cte

I'm trying to create a recursive CTE that traverses all the records for a given ID, and does some operations between ordered records. Let's say I have customers at a bank who get charged a uniquely identifiable fee, and a customer can pay that fee in any number of installments:
WITH recursive payments (
id
, index
, fees_paid
, fees_owed
)
AS (
SELECT id
, index
, fees_paid
, fee_charged
FROM table
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, t.fees_paid
, p.fees_owed - p.fees_paid
FROM table t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
The join logic seems sound, but when I join the output of this query to the source table, I'm getting non-deterministic and incorrect results.
This is my first foray into Snowflake's recursive CTEs. What am I missing in the intermediate result logic that is leading to the non-determinism here?
I assume this is edited code, because in the anchor of you CTE you select the fourth column fee_charged which does not exist, and then in the recursion you don't sum the fees paid and other stuff, basically you logic seems rather strange.
So creating some random data, that has two different id streams to recurse over:
create or replace table data (id number, index number, val text);
insert into data
select * from values (1,1,'a'),(2,1,'b')
,(1,2,'c'), (2,2,'d')
,(1,3,'e'), (2,3,'f')
v(id, index, val);
Now altering you CTE just a little bit to concat that strings together..
WITH RECURSIVE payments AS
(
SELECT id
, index
, val
FROM data
WHERE index = 1
UNION ALL
SELECT t.id
, t.index
, p.val || t.val as val
FROM data t
JOIN payments p
ON t.id = p.id
AND t.index = p.index + 1
)
SELECT *
FROM payments
ORDER BY 1,2;
we get:
ID INDEX VAL
1 1 a
1 2 ac
1 3 ace
2 1 b
2 2 bd
2 3 bdf
Which is exactly as I would expect. So how this relates to your "it gets strange when I join to other stuff" is ether, your output of you CTE is not how you expect it to be.. Or your join to other stuff is not working as you expect, Or there is a bug with snowflake.
Which all comes down to, if the CTE results are exactly what you expect, create a table and join that to your other table, so eliminate some form of CTE vs JOIN bug, and to debug why your join is not working.
But if your CTE output is not what you expect, then lets help debug that.

Trying to get tie results on column not included in order by

Given the following result sets, sorted in descending order on column Date.
I want to use Top clause like:
select top 4 *
from donation d
order by d.Date desc;
Which give me the following result:
Even though I limit my result with top 4, I want to be able to include the last row, because it is tie with the last record (id : 5) based on the Name.
This query first selects the names of the people in the top 4 dates (in a Common Table Expression), then uses these names to shown all their data. I guess that is what you want.
;WITH Top4Names AS
(
SELECT TOP 4 [Name]
FROM donation
ORDER BY [Date] DESC
)
SELECT d.id, d.[Name], d.Amount, d.[Date]
FROM donation d
INNER JOIN Top4Names t
ON t.[Name] = d.[Name]
ORDER BY d.[Date] DESC;

Group by custom function in T-SQL

I have a table of people in which there may be duplicates. My goal is to return a list of possible duplicates so that we can combine them into a new person.
I want to group by first_name and last_name, obviously. However, if both person records have a defined birth_date and those dates differ, then I want to exclude the records, since odds are the people are different but happen to have the same name.
The other caveat is that in our system (which I inherited), the birth_date column is NOT NULL, and non-specified birth_dates are set to '1900-01-01'.
Is there a way I can GROUP BY a custom function (or use some other clever logic) that either compares just the birth_date columns checking to see if both are not the default date or they are if the same, or else takes in arguments, like say each person_id and compares the records against each other, returning a BIT to decide whether they should count as the same group?
I'd like to avoid CLR-defined aggregate functions (since I'm inexperienced with it).
So far (without the birth_date comparison) the query I have is:
SELECT *
FROM core_person P
WHERE last_name + ',' + first_name IN
(SELECT last_name + ',' + first_name "name"
FROM core_person
GROUP BY last_name + ',' + first_name
HAVING COUNT(*) > 1)
ORDER BY last_name + ',' + first_name
I would like to add something to the GROUP BY clause to compare the birth dates.
You can use the nullif function to return nulls if the date is equal to 1/1/1900 as nullif(Birthday,'1/1/1900') to your advantage.
This query can get you started to see all the records with their possible matches:
select p1.person_id
from core_person p1
join core_person p2
on p1.person_id <> p2.person_id
and LEFT(p1.first_name,5) = LEFT(p2.first_name,5)
and LEFT(p1.last_name,5) = LEFT(p2.last_name,5)
and isnull(nullif(p1.Birthday,'1/1/1900'), p2.Birthday) = isnull(nullif(p2.Birthday,'1/1/1900'), p1.Birthday)
group by p1.person_id
If either one of the Birthday's are equal to 1/1/1900 it will compare the birthday to itself, otherwise it will only join on equality of the birthday's in both records.
If you don't want to see your matches, you can use a variation of the query above as a sub-query to return only id values that are duplicates:
select core_person
from core_person
where person_id in
(
select p1.person_id
from core_person p1
join core_person p2
on p1.person_id <> p2.person_id
and LEFT(p1.first_name,5) = LEFT(p2.first_name,5)
and LEFT(p1.last_name,5) = LEFT(p2.last_name,5)
and isnull(nullif(p1.Birthday,'1/1/1900'), p2.Birthday) = isnull(nullif(p2.Birthday,'1/1/1900'), p1.Birthday)
group by p1.person_id
)
Instead of grouping, will something like this work for you?
select * from MyTable a
left join MyTable b
on a.person_id < b.person_id
and a.first_name = b.first_name
and a.last_name = b.last_name
and (
a.birthdate = b.birthdate
or a.birthdate = '1900-1-1'
or b.birthdate = '1900-1-1'
)
It matches rows where last name and first name match, and either birthdates match or one birthdate is your placeholder value. The person_ID part of the join gets rid of duplicates (e.g. 1 matches to 2, then another row where 2 matches to 1, or 1 matches to 1).
You may want to broaden the match criteria for the names to look at first few characters or use SOUNDEX, but then your matches would probably require more hand-sorting as a final step.
Edit: to return a list of all records that have a possible duplicate in the table, not associated with their matches, use this instead:
select distinct a.* from MyTable a
inner join MyTable b
on a.person_id <> b.person_id
and a.first_name = b.first_name
and a.last_name = b.last_name
and (
a.birthdate = b.birthdate
or a.birthdate = '1900-1-1'
or b.birthdate = '1900-1-1'
)
order by a.first_name, a.last_name, a.birthdate

where not in / where not like subquery

Can somebody help me out with a MS-SQL query please.
I have the following:
select Name from Keyword.dbo.NGrams
where Name not in (select Name from Keyword.dbo.Brands)
What I really want is something like this, but I can't get the syntax right
select Name from Keyword.dbo.NGrams
where Name not like (select Name from Keyword.dbo.Brands)
"not in" works great for NGrams & Brands that match exactly. But my NGrams are multiple words long and some contain a Brand within them.
Thanks so much
Edit: Maybe I can re-clarify what I am looking for my this pseudo sql:
select Name from Keyword.dbo.NGrams
where Description not containing (select Word from Keyword.dbo.Brands)
Brand is a list of single words. Description in NGrams would be a 2 or 3 word phrase. I want to select all the NGrams that do not contain any of the Brands
SELECT
n.Name
FROM Keyword.dbo.NGrams n
LEFT JOIN Keyword.dbo.Brands b
ON n.Name LIKE '%'+b.Name+'%'
WHERE b.Name IS NULL
SQL Fiddle Demo
If you want to avoid the Scunthorpe Problem and only match whole words, change the join condition to:
ON ' '+n.Name+' ' LIKE '% '+b.Name+' %'
Use a where not exists to express the like:
select Name
from Keyword.dbo.NGrams ng
where not exists (
select *
from Keyword.dbo.Brands b
where ng.Name like '%' + b.name + '%'
)
I ran a test using the ENABLE2K standard English word list. I generated 10 million random ngrams and 50000 random brands. The query takes about 1 minute to run on my workstation.
CREATE TABLE #enable2k (word varchar(max) NOT NULL)
BULK INSERT #enable2k FROM 'C:\enable2k.txt'
CREATE TABLE #ngrams (ngram_id int NOT NULL, word_num int NOT NULL, word varchar(max) NOT NULL, PRIMARY KEY(ngram_id, word_num));
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 1, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 2, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
INSERT #ngrams SELECT TOP 10000000 ROW_NUMBER() OVER(ORDER BY NEWID()), 3, word FROM #enable2k,(SELECT TOP 58 0 FROM master..spt_values) t(i)
CREATE TABLE #brands (brand varchar(32) NOT NULL PRIMARY KEY)
INSERT #brands SELECT TOP 50000 word FROM #enable2k WHERE LEN(word) <= 32 ORDER BY NEWID()
SELECT *
FROM #ngrams n
PIVOT (MIN(word) FOR word_num IN ([1],[2],[3])) n1
WHERE NOT EXISTS (
SELECT 1
FROM #ngrams n2
INNER JOIN #brands b
ON (n2.word = b.brand)
WHERE n1.ngram_id = n2.ngram_id
)

How to develop a recursive CTE in T-SQL?

I am new to recursive CTEs. I am trying to develop a CTE which will return all of the employees under each manager name. So I have two tables: people_rv and staff_rv
People_rv table contains all of the people, both managers and employees. Staff_rv only contains manager information. Uniqueidentifier staff values are stored in Staff_rv. Uniqueidentifier employee values are stored in people_rv. People_rv contains varchar first and last name values for both managers and employees.
But when I run the following CTE I get an error:
WITH
cteStaff (ClientID, FirstName, LastName, SupervisorID, EmpLevel)
AS
(
SELECT p.people_id, p.first_name, p.last_name, s.supervisor_id,1
FROM people_rv p JOIN staff_rv s on s.people_id = p.people_id
WHERE s.supervisor_id = '95E16819-8C3A-4098-9430-08F0E3B764E1'
UNION ALL
SELECT p2.people_id, p2.first_name, p2.last_name, s2.supervisor_id, r.EmpLevel + 1
FROM people_rv p2 JOIN staff_rv s2 on s2.people_id = p2.people_id
INNER JOIN cteStaff r on s2.staff_id = r.ClientID
)
SELECT
FirstName + ' ' + LastName AS FullName,
EmpLevel,
(SELECT first_name + ' ' + last_name FROM people_rv p join staff_rv s on s.people_id = p.people_id
WHERE s.staff_id = cteStaff.SupervisorID) AS Manager
FROM cteStaff
OPTION (MAXRECURSION 0);
My output is:
Barbara G 1 Melanie K
Dawn P 1 Melanie K
Garrett M 1 Melanie K
Stephanie P 1 Melanie K
Amanda F 1 Melanie K
Amanda T 1 Melanie K
Stephanie G 1 Melanie K
Carlos H 1 Melanie K
So it is not iterating any more than the first level. Why not?
Melanie is the top most supervisor, but each of the persons in the leftmost column are also supervisors. So this query should also return level 2.
You may be in an infinite loop with your join. I would check how many levels you expect the table to actually go down. Generally you join a recursion on something similar to do
ID = ParentID
of something either contained in a table or in an expression. Keep in mind you can also create a CTE prior to a recursive CTE if you have to make up your relationship.
Here is an example that will self execute, it may help.
Declare #table table ( PersonId int identity, PersonName varchar(512), Account int, ParentId int, Orders int);
insert into #Table values ('Brett', 1, NULL, 1000),('John', 1, 1, 100),('James', 1, 1, 200),('Beth', 1, 2, 300),('John2', 2, 4, 400);
select
PersonID
, PersonName
, Account
, ParentID
from #Table
; with recursion as
(
select
t1.PersonID
, t1.PersonName
, t1.Account
--, t1.ParentID
, cast(isnull(t2.PersonName, '')
+ Case when t2.PersonName is not null then '\' + t1.PersonName else t1.PersonName end
as varchar(255)) as fullheirarchy
, 1 as pos
, cast(t1.orders +
isnull(t2.orders,0) -- if the parent has no orders than zero
as int) as Orders
from #Table t1
left join #Table t2 on t1.ParentId = t2.PersonId
union all
select
t.PersonID
, t.PersonName
, t.Account
--, t.ParentID
, cast(r.fullheirarchy + '\' + t.PersonName as varchar(255))
, pos + 1 -- increases
, r.orders + t.orders
from #Table t
join recursion r on t.ParentId = r.PersonId
)
, b as
(
select *, max(pos) over(partition by PersonID) as maxrec -- I find the maximum occurrence of position by person
from recursion
)
select *
from b
where pos = maxrec -- finds the furthest down tree
-- and Account = 2 -- I could find just someone from a different department
Your problem as far as I can tell is is you have no join connecting managers to their employees.
This join
INNER JOIN cteStaff r on r.StaffID = s2.staff_id
Just joins the same initial level 1 staffer back to himself.
UPDATE:
Still not quite right! You have a supervisor_id, but again you're still not actually using that to join back to the CTE.
So for each recursion of this CTE you need to (excluding the name join):
select {Level 1 Boss}, NULL (no supervisor)
union
select {new employee}, {that employee's boss}
So the join must connect the CTE's ClientID (the level 1 boss) to the second UNION query's supervisor field, which looks to be supervisor_id , not staff_id.
The JOIN to accomplish this second task is (from what I can tell of your staff_rv table schema):
SELECT p2.people_id, p2.first_name, p2.last_name, s2.supervisor_id, r.EmpLevel + 1
FROM people_rv p2 JOIN staff_rv s2 on s2.people_id = p2.people_id
INNER JOIN cteStaff r on s2.supervisor_id = r.ClientID
Note the bottom join joins the r.ClientID (the level 1 boss) to the staffer's supervisor_id field.
(NB: I think your staff_id and supervisor_id's mimic your people_id values from the people_rv table, so this join should work fine. But if they are different (i.e. a staffer's supervisor_id isn't that supervisor's people_id) then you'll need to write the join such that the staffer's supervisor_id can be joined to their people_id you're storing as ClientID in the CTE.)
Here's a good simple Recursive CTE to review (it may not be the answer, but someone else searching on how to make a recursive CTE may need it):
-- Recursive CTE
;
WITH Years ( myYear )
AS (
-- Base case
SELECT DATEPART(year, GETDATE())
UNION ALL
-- Recursive
SELECT Years.myYear - 1
FROM Years
WHERE Years.myYear >= 2002
)
SELECT *
FROM Years
Note that this probably won't solve your problem, but is a means to hopefully seeing where you're going wrong in the original query.
The default is 100 levels of recursion - you can set it to unlimited by using the MAXRECURSION query hint where you're selecting from your CTE:
...
FROM cteStaff
OPTION (MAXRECURSION 0);
From MSDN:
MAXRECURSION number
Specifies the maximum number of recursions allowed for this query. number is a nonnegative integer between 0 and 32767. When 0 is
specified, no limit is applied. If this option is not specified, the
default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached during query execution, the query is ended and an error is
returned.
Because of this error, all effects of the statement are rolled back. If the statement is a SELECT statement, partial results or no
results may be returned. Any partial results returned may not include
all rows on recursion levels beyond the specified maximum recursion
level.

Resources