STRING_SPLIT with Row Number - sql-server

Here is what I am trying to produce:
Row_Num Person Value Row_Number
1 Leo Math 1
1 Leo Science 2
1 Leo History 3
1 Leo Math,Science,History 4
2 Robert Gym 2
2 Robert Math 3
2 Robert History 4
2 Robert Gym,Math,History 1
3 David Art 1
3 David Science 2
3 David English 3
3 David History 4
3 David Computer Science 5
3 David Art,Science,English,History,Computer Science 6
This is the code I am using:
with part_1 as
(
select
1 as [Row_Num],
'Leo' as [Person],
'Math,Science,History' as [Subjects]
---
union
---
select
'2',
'Robert',
'Gym,Math,History'
---
union
---
select
'3',
'David',
'Art,Science,English,History,Computer Science'
---
)
----------------------------------------------------------------------
select
[Row_Num],
[Person],
[Subjects]
into
#part1
from
part_1;
go
--------------------------------------------------------------------------------
with p_2 as(
select
[Row_Num],
[Person],
--[Subjects],
[value]
from
#part1
cross apply
STRING_SPLIT([Subjects],',')
union all
select
[Row_Num],
[Person],
[Subjects]
from
#part1
)
select
[Row_Num]
,[Person]
,[Value]
,row_number()
over(Partition by Row_Num order by (select 1)) as [Row_Number]
from
p_2
order by
[Row_Num]
,[Row_Number]
Here is what I am producing:
Row_Num Person Value Row_Number
1 Leo Math 1
1 Leo Science 2
1 Leo History 3
1 Leo Math,Science,History 4
2 Robert Gym,Math,History 1
2 Robert Gym 2
2 Robert Math 3
2 Robert History 4
3 David Art 1
3 David Science 2
3 David English 3
3 David History 4
3 David Computer Science 5
3 David Art,Science,English,History,Computer Science 6
It looks good, until you look at Robert. All of the subjects are on the first row, instead of the bottom.
Any suggestions?

STRING_SPLIT is documented to not "care" about ordinal position:
The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. You can override the final sort order by using an ORDER BY clause on the SELECT statement (ORDER BY value).
If the ordinal position of the data is important, don't use STRING_SPLIT. Personally, I recommend using delimitedsplit8k_LEAD, which includes a itemnumber column.
But idealy, the real solution is to stop storing delimited data in your database. Create 2 further tables, one with a list of the subjects, and another that creates a relationship between the student and subject.
Note that SQL Server 2022 brings a new parameter to STRING_SPLIT called ordinal which, when 1 is passed to it, will cause STRING_SPLIT to return an additional column (called ordinal) with the ordinal position of the value within the string; so you could add that column to your ORDER BY to ensure the ordering in maintained.
Of course, this doesn't change the fact that you should not be storing delimited data to start with, and should still be aiming to fix your design.

Here's an easy solution.
DECLARE #x VARCHAR(1000) = 'a,b,c,d,e,f,g';
DECLARE #t TABLE
(
[Index] INT PRIMARY KEY IDENTITY(1, 1)
, [Value] VARCHAR(50)
)
INSERT INTO #t (VALUE)
SELECT [Value]
FROM string_split(#x, ',')
SELECT * FROM #t
Wrap it like this:
CREATE FUNCTION SPLIT_STRING2
(
#x VARCHAR(5000)
, #y VARCHAR(5000)
) RETURNS #t TABLE
(
[Index] INT PRIMARY KEY IDENTITY(1, 1)
, [Value] VARCHAR(50)
)
AS
BEGIN
INSERT INTO #t (VALUE)
SELECT [Value]
FROM string_split(#x, #y)
RETURN
END

Here is a recursive CTE method to parse the Subject. There's one anchor and two recursive queries. The first recursive query parses the Subjects. The second recursive query adds the summary. I added the special case of one subject. The summary and parsed subjects records are the same. (This means there was only one subjects in Subjects.) It is filtered out in this example.
This only works because a recursive CTE has only the records from the prior iteration. If looking for set N+1 and referring back to the CTE, the CTE has records from set N, not sets 1 through N. (Reminds me of the math proof - prove n = 1 is true, then prove if n true then n+1 true. Then it's true for all n > 0.)
DECLARE #delimiter char(1) = ',';
WITH part_1 as (
-- Subjects has 0 or more "tokens" seperated by a comma
SELECT *
FROM (values
(1, 'Leo', 'Math,Science,History'),
('2', 'Robert', 'Gym,Math,History'),
('3', 'David', 'Art,Science,English,History,Computer Science'),
('4', 'Lazy', 'Art')
) t ([Row_Num],[Person],[Subjects])
), part_2 as (
-- Anchor on the first token. Every token has delimiter before an after, even if we have to pretend it exists
SELECT Row_Num, Person, Subjects,
LEN(Subjects) + 1 as [index_max], -- the last index possible is a "pretend" index just after the end of the string
1 as N, -- this is the first token
0 as [index_before], -- for the first token, pretend the delimiter exists just before the first character at index 0
CASE WHEN CHARINDEX(#delimiter, Subjects) > 0 THEN CHARINDEX(#delimiter, Subjects) -- delimiter after exists
ELSE LEN(Subjects) + 1 -- pretend the delimiter exists just after the end of the string at index len + 1
END as [index_after],
CAST(1 as bit) as [is_token] -- needed to stop the 2nd recursion
FROM part_1
-- Recursive part that checks records for token N to add records for token N + 1 if it exists
UNION ALL
SELECT Row_Num, Person, Subjects,
index_max,
N + 1,
index_after, -- the delimiter before is just the prior token's delimiter after.
CASE WHEN CHARINDEX(#delimiter, Subjects, index_after + 1) > 0 THEN CHARINDEX(#delimiter, Subjects, index_after + 1) -- delimiter after exists
ELSE index_max -- pretend the delimiter exists just after the end of the string at index len + 1
END,
CAST(1 as bit) as [is_token] -- needed to stop the 2nd recursion
FROM part_2 -- a recursive CTE has only the prior result for token N, not accumulated result of tokens 1 to N
WHERE index_after > 0 AND index_after < index_max
UNION ALL
-- Another recursive part that checks if the prior token is the last. If the last, add the record with full string that was just parsed.
SELECT Row_Num, Person, Subjects,
index_max,
N + 1, -- this is not a token
0, -- the entire originsal string is desired
index_max, -- the entire originsal string is desired
CAST(0 as bit) as [is_token] -- not a token - stops this recursion
FROM part_2 -- this has only the prior result for N, not accumulated result of 1 to N
WHERE index_after = index_max -- the prior token was the last
AND is_token = 1 -- it was a token - stops this recursion
AND N > 1 -- add this to remove the added record it it's identical - 1 token
)
SELECT Row_Num, Person, TRIM(SUBSTRING(Subjects, index_before + 1, index_after - index_before - 1)) as [token], N,
index_max, index_before, index_after, is_token
FROM part_2
ORDER BY Row_Num, N -- Row_Num, is_token DESC, N is not required
Row_Num Person token N index_max index_before index_after is_token
----------- ------ -------------------------------------------- ----------- ----------- ------------ ----------- --------
1 Leo Math 1 21 0 5 1
1 Leo Science 2 21 5 13 1
1 Leo History 3 21 13 21 1
1 Leo Math,Science,History 4 21 0 21 0
2 Robert Gym 1 17 0 4 1
2 Robert Math 2 17 4 9 1
2 Robert History 3 17 9 17 1
2 Robert Gym,Math,History 4 17 0 17 0
3 David Art 1 45 0 4 1
3 David Science 2 45 4 12 1
3 David English 3 45 12 20 1
3 David History 4 45 20 28 1
3 David Computer Science 5 45 28 45 1
3 David Art,Science,English,History,Computer Science 6 45 0 45 0
4 Lazy Art 1 4 0 4 1

Related

Finding A Time When A Value Changed

I am still learning many new things about SQL such as PARTITION BY and CTEs. I am currently working on a query which I have cobbled together from a similar question I found online. However, I can not seem to get it to work as intended.
The problem is as follows -- I have been tasked to show rank promotions in an organization from the begining of 2022 to today. I am working with 2 primary tables, an EMPLOYEES table and a PERIODS table. This periods table captures a snapshot of any given employee each month - including their rank at the time. Each of these months is also assigned a PeriodID (e.g. Jan 2022 = PeriodID 131). Our EMPLOYEE table holds the employees current rank. These ranks are stored as an int (e.g. 1,2,3 with 1 being lowest rank). It is possible for an employee to rank up more than once in any given month.
I have simplified the used query as much as I can for the sake of this problem. Query follows as:
;WITH x AS
(
SELECT
e.EmployeeID, p.PeriodID, p.RankID,
rn = ROW_NUMBER() OVER (PARTITION BY e.EmployeeID ORDER BY p.PeriodID DESC)
FROM employees e
LEFT JOIN periods p on p.EmployeeID= e.EmployeeID
WHERE p.PeriodID <= 131 AND p.PeriodID >=118 --This is the time range mentioned above
),
rest AS (SELECT * FROM x WHERE rn > 1)
SELECT
main.EmployeeID,
PeriodID = MIN(
CASE
WHEN main.CurrentRankID = Rest.RankID
THEN rest.PeriodID ELSE main.PeriodID
END),
main.RankID, rest.RankID
FROM x AS main LEFT OUTER JOIN rest ON main.EmployeeID = rest.EmployeeID
AND rest.rn >1
LEFT JOIN periods p on p.EmployeeID = e.EmployeeID
WHERE main.rn = 1
AND NOT EXISTS
(
SELECT 1 FROM rest AS rest2
WHERE EmployeeID = rest.EmployeeID
AND rn < rest.rn
AND main.RankID <> rest.RankID
)
and p.PeriodID <= 131 AND p.PeriodID >=118
GROUP BY main.EmployeeID, main.PeriodID, main.RankID, rest.RankID
As mentioned before, this query was borrowed from a similar question and modified for my own use. I imagine the bones of the query is good and maybe I have messed up a variable somewhere but I can not seem to locate the problem line. The end goal is for the query to result in a table showing the EmployeeID, PeriodID, the rank they are being promoted from, and the rank they are being promoted to in the month the promotion was earned. Similar to the below.
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
2
123
133
2
3
Instead, my query is spitting out repeating previous/current ranks and the PeriodIDs seem to be static (such as what is shown below).
EmployeeID
PeriodID
PerviousRankID
NewRank
123
131
1
1
123
131
1
1
I am hoping someone with a greater knowledge base on these functions is able to quickly notice my mistake.
If we assume some example DML/DDL (it's really helpful to provide this with your question):
DECLARE #Employees TABLE (EmployeeID INT IDENTITY, Name VARCHAR(20), RankID INT);
DECLARE #Periods TABLE (PeriodID INT, EmployeeID INT, RankID INT);
INSERT INTO #Employees (Name, RankID) VALUES ('Jonathan', 10),('Christopher', 10),('James', 10),('Jean-Luc', 8);
INSERT INTO #Periods (PeriodID, EmployeeID, RankID) VALUES
(1,1,1),(2,1,1),(3,1,1),(4,1,8 ),(5,1,10),(6,1,10),
(1,2,1),(2,2,1),(3,2,1),(4,2,8 ),(5,2,8 ),(6,2,10),
(1,3,1),(2,3,1),(3,3,7),(4,3,10),(5,3,10),(6,3,10),
(1,4,1),(2,4,1),(3,4,1),(4,4,8 ),(5,4,9 ),(6,4,9 )
Then we can accomplish what I think you're looking for using a OUTER APPLY then aggregates the values based on the current-row values:
SELECT e.EmployeeID, e.Name, e.RankID AS CurrentRank, ap.PeriodID AS ThisPeriod, p.PeriodID AS LastRankChangePeriodID, p.RankID AS LastRankChangedFrom, ap.RankID - p.RankID AS LastRankChanged
FROM #Employees e
LEFT OUTER JOIN #Periods ap
ON e.EmployeeID = ap.EmployeeID
OUTER APPLY (
SELECT EmployeeID, MAX(PeriodID) AS PeriodID
FROM #Periods
WHERE EmployeeID = e.EmployeeID
AND RankID <> ap.RankID
AND PeriodID < ap.PeriodID
GROUP BY EmployeeID
) a
LEFT OUTER JOIN #Periods p
ON a.EmployeeID = p.EmployeeID
AND a.PeriodID = p.PeriodID
ORDER BY e.EmployeeID, ap.PeriodID DESC
Using the correlated subquery we get a view of the data which we can filter using the current-row values, and we aggregate that to return the period we're looking for (where it's before this period, and it's not the same rank). Then it's just a join back to the Periods table to get the values.
You used an LEFT JOIN, so I've preserved that using an OUTER APPLY. If you wanted to filter using it, it would be a CROSS APPLY instead.
EmployeeID
Name
CurrentRank
ThisPeriod
LastRankChangePeriodID
LastRankChangedFrom
LastRankChanged
1
Jonathan
10
6
4
8
2
1
Jonathan
10
5
4
8
2
1
Jonathan
10
4
3
1
7
1
Jonathan
10
3
1
Jonathan
10
2
1
Jonathan
10
1
2
Christopher
10
6
5
8
2
2
Christopher
10
5
3
1
7
2
Christopher
10
4
3
1
7
2
Christopher
10
3
2
Christopher
10
2
2
Christopher
10
1
3
James
10
6
3
7
3
3
James
10
5
3
7
3
3
James
10
4
3
7
3
3
James
10
3
2
1
6
3
James
10
2
3
James
10
1
4
Jean-Luc
8
6
5
9
-1
4
Jean-Luc
8
5
4
8
1
4
Jean-Luc
8
4
3
1
7
4
Jean-Luc
8
3
4
Jean-Luc
8
2
4
Jean-Luc
8
1
Now we can see what the previous change looked like for each period. Currently Jonathan is has RankID 10. Last time that was different was in PeriodID 4 when it was 8. The same was true for PeriodID 5. In PeriodID 4 he had RankID 8, and prior to that he had RankID 1. Before that his Rank hadn't changed.
Jean-Luc was actually demoted as his last change. I don't know if this is possible within your model.

Grouping between two datetimes

I have a bunch of production orders and I'm trying to group by within a datetime range, then count the quantity within that range. For example, I want to group from 2230 to 2230 each day.
PT.ActualFinish is datetime (eg. if PT.ActualFinish is 2020-05-25 23:52:30 then it would be counted on the 26th May instead of the 25th)
Currently it's grouped by date (midnight to midnight) as opposed to the desired 2230 to 2230.
GROUP BY CAST(PT.ActualFinish AS DATE)
I've been trying to reconcile some DATEADD with the GROUP without success. Is it possible?
Just add 1.5 hours (90 minutes) and then extract the date:
group by convert(date, dateadd(minute, 90, pt.acctualfinish))
For this kind of thing you can use a function I created called NGroupRangeAB (code below) which can be used to create groups over values with an upper and lower bound.
Note that this:
SELECT f.*
FROM core.NGroupRangeAB(0,1440,12) AS f
ORDER BY f.RN;
Returns:
RN GroupNumber Low High
--- ------------ ------ -------
0 1 0 120
1 2 121 240
2 3 241 360
3 4 361 480
4 5 481 600
5 6 601 720
6 7 721 840
7 8 841 960
8 9 961 1080
9 10 1081 1200
10 11 1201 1320
11 12 1321 1440
This:
SELECT
f.GroupNumber,
L = DATEADD(MINUTE,f.[Low]-SIGN(f.[Low]),CAST('00:00:00.0000000' AS TIME)),
H = DATEADD(MINUTE,f.[High]-1,CAST('00:00:00.0000000' AS TIME))
FROM core.NGroupRangeAB(0,1440,12) AS f
ORDER BY f.RN;
Returns:
GroupNumber L H
------------- ---------------- ----------------
1 00:00:00.0000000 01:59:00.0000000
2 02:00:00.0000000 03:59:00.0000000
3 04:00:00.0000000 05:59:00.0000000
4 06:00:00.0000000 07:59:00.0000000
5 08:00:00.0000000 09:59:00.0000000
6 10:00:00.0000000 11:59:00.0000000
7 12:00:00.0000000 13:59:00.0000000
8 14:00:00.0000000 15:59:00.0000000
9 16:00:00.0000000 17:59:00.0000000
10 18:00:00.0000000 19:59:00.0000000
11 20:00:00.0000000 21:59:00.0000000
12 22:00:00.0000000 23:59:00.0000000
Now for a real-life example that may help you:
-- Sample Date
DECLARE #table TABLE (tm TIME);
INSERT #table VALUES ('00:15'),('11:20'),('21:44'),('09:50'),('02:15'),('02:25'),
('02:31'),('23:31'),('23:54');
-- Solution:
SELECT
GroupNbr = f.GroupNumber,
TimeLow = f2.L,
TimeHigh = f2.H,
Total = COUNT(t.tm)
FROM core.NGroupRangeAB(0,1440,12) AS f
CROSS APPLY (VALUES(
DATEADD(MINUTE,f.[Low]-SIGN(f.[Low]),CAST('00:00:00.0000000' AS TIME)),
DATEADD(MINUTE,f.[High]-1,CAST('00:00:00.0000000' AS TIME)))) AS f2(L,H)
LEFT JOIN #table AS t
ON t.tm BETWEEN f2.L AND f2.H
GROUP BY f.GroupNumber, f2.L, f2.H;
Returns:
GroupNbr TimeLow TimeHigh Total
-------------------- ---------------- ---------------- -----------
1 00:00:00.0000000 01:59:00.0000000 1
2 02:00:00.0000000 03:59:00.0000000 3
3 04:00:00.0000000 05:59:00.0000000 0
4 06:00:00.0000000 07:59:00.0000000 0
5 08:00:00.0000000 09:59:00.0000000 1
6 10:00:00.0000000 11:59:00.0000000 1
7 12:00:00.0000000 13:59:00.0000000 0
8 14:00:00.0000000 15:59:00.0000000 0
9 16:00:00.0000000 17:59:00.0000000 0
10 18:00:00.0000000 19:59:00.0000000 0
11 20:00:00.0000000 21:59:00.0000000 1
12 22:00:00.0000000 23:59:00.0000000 2
Note that an inner join will eliminate the 0-count rows.
CREATE FUNCTION core.NGroupRangeAB
(
#min BIGINT, -- Group Number Lower boundary
#max BIGINT, -- Group Number Upper boundary
#groups BIGINT -- Number of groups required
)
/*****************************************************************************************
[Purpose]:
Creates an auxilliary table that allows for grouping based on a given set of rows (#rows)
and requested number of "row groups" (#groups). core.NGroupRangeAB can be thought of as a
set-based, T-SQL version of Oracle's WIDTH_BUCKET, which:
"...lets you construct equiwidth histograms, in which the histogram range is divided into
intervals that have identical size. (Compare with NTILE, which creates equiheight
histograms.)" https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions214.htm
See usage examples for more details.
[Author]:
Alan Burstein
[Compatibility]:
SQL Server 2008+
[Syntax]:
--===== Autonomous
SELECT ng.*
FROM dbo.NGroupRangeAB(#rows,#groups) AS ng;
[Parameters]:
#rows = BIGINT; the number of rows to be "tiled" (have group number assigned to it)
#groups = BIGINT; requested number of tile groups (same as the parameter passed to NTILE)
[Returns]:
Inline Table Valued Function returns:
GroupNumber = BIGINT; a row number beginning with 1 and ending with #rows
Members = BIGINT; Number of possible distinct members in the group
Low = BIGINT; the lower-bound range
High = BIGINT; the Upper-bound range
[Dependencies]:
core.rangeAB (iTVF)
[Developer Notes]:
1. An inline derived tally table using a CTE or subquery WILL NOT WORK. NTally requires
a correctly indexed tally table named dbo.tally; if you have or choose to use a
permanent tally table with a different name or in a different schema make sure to
change the DDL for this function accordingly. The recomended number of rows is
1,000,000; below is the recomended DDL for dbo.tally. Note the "Beginning" and "End"
of tally code.To learn more about tally tables see:
http://www.sqlservercentral.com/articles/T-SQL/62867/
2. For best results a P.O.C. index should exists on the table that you are "tiling". For
more information about P.O.C. indexes see:
http://sqlmag.com/sql-server-2012/sql-server-2012-how-write-t-sql-window-functions-part-3
3. NGroupRangeAB is deterministic; for more about deterministic and nondeterministic functions
see https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
-----------------------------------------------------------------------------------------
--===== 1. Basic illustration of the relationship between core.NGroupRangeAB and NTILE.
-- Consider this query which assigns 3 "tile groups" to 10 rows:
DECLARE #rows BIGINT = 7, #tiles BIGINT = 3;
SELECT t.N, t.TileGroup
FROM ( SELECT r.RN, NTILE(#tiles) OVER (ORDER BY r.RN)
FROM core.rangeAB(1,#rows,1,1) AS r) AS t(N,TileGroup);
Results:
N TileGroup
--- ----------
1 1
2 1
3 1
4 2
5 2
6 3
7 3
To pivot these "equiheight histograms" into "equiwidth histograms" we could do this:
DECLARE #rows BIGINT = 7, #tiles BIGINT = 3;
SELECT TileGroup = t.TileGroup,
[Low] = MIN(t.N),
[High] = MAX(t.N),
Members = COUNT(*)
FROM ( SELECT r.RN, NTILE(#tiles) OVER (ORDER BY r.RN)
FROM core.rangeAB(1,#rows,1,1) AS r) AS t(N,TileGroup);
GROUP BY t.TileGroup;
Results:
TileGroup Low High Members
---------- ---- ----- -----------
1 1 3 3
2 4 5 2
3 6 7 2
This will return the same thing at a tiny fraction of the cost:
SELECT TileGroup = ng.GroupNumber,
[Low] = ng.[Low],
[High] = ng.[High],
Members = ng.Members
FROM core.NGroupRangeAB(1,#rows,#tiles) AS ng;
--===== 2.1. Divide 25 Rows into 3 groups
DECLARE #min BIGINT = 1, #max BIGINT = 25, #groups BIGINT = 4;
SELECT ng.GroupNumber, ng.Members, ng.low, ng.high
FROM core.NGroupRangeAB(#min,#max,#groups) AS ng;
--===== 2.2. Assign group membership to another table
DECLARE #min BIGINT = 1, #max BIGINT = 25, #groups BIGINT = 4;
SELECT
ng.GroupNumber, ng.low, ng.high, s.WidgetId, s.Price
FROM (VALUES('a',$12),('b',$22),('c',$9),('d',$2)) AS s(WidgetId,Price)
JOIN core.NGroupRangeAB(#min,#max,#groups) AS ng
ON s.Price BETWEEN ng.[Low] AND ng.[High]
ORDER BY ng.RN;
Results:
GroupNumber low high WidgetId Price
------------ ---- ----- --------- ---------------------
1 1 7 d 2.00
2 8 13 a 12.00
2 8 13 c 9.00
4 20 25 b 22.00
-----------------------------------------------------------------------------------------
[Revision History]:
Rev 00 - 20190128 - Initial Creation; Final Tuning - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
SELECT
RN = r.RN, -- Sort Key
GroupNumber = r.N2, -- Bucket (group) number
Members = g.S-ur.N+1, -- Count of members in this group
[Low] = r.RN*g.S+rc.N+ur.N, -- Lower boundary for the group (inclusive)
[High] = r.N2*g.S+rc.N -- Upper boundary for the group (inclusive)
FROM core.rangeAB(0,#groups-1,1,0) AS r -- Range Function
CROSS APPLY (VALUES((#max-#min)/#groups,(#max-#min)%#groups)) AS g(S,U) -- Size, Underflow
CROSS APPLY (VALUES(SIGN(SIGN(r.RN-g.U)-1)+1)) AS ur(N) -- get Underflow
CROSS APPLY (VALUES(#min+r.RN-(ur.N*(r.RN-g.U)))) AS rc(N); -- Running Count
GO

How to avoid duplicate values in joining two or three tables?

I have this two tables and I want to join their two ID's.
Household Info
1
2
3
Household Members
1
1
1
2
3
3
3
3
3
The values is repeating over and over again, as you have noticed on my screenshot. The output I want is, I want a query of this:
Household Info.HID Household Members.HID
1 1
1
1
2 2
3 3
3
3
3
3
Since in the Table Household Info there are only 3 HID while the table Household Members there are three:1, one:2, and five:3
Hope you can help me on this one :3
EDITED: I am using Microsoft Access as RDBMS
For an RDBMS which supports CTE...
DECLARE #Household TABLE
( Household VARCHAR(10))
;
INSERT INTO #Household
( Household )
VALUES
(1),
(2),
(3)
;
declare #HouseholdMembers TABLE
( HouseholdMembers VARCHAR(10))
;
INSERT INTO #HouseholdMembers
( HouseholdMembers )
VALUES
(1),
(1),
(1),
(2),
(3),
(3),
(3),
(3),
(3)
;
Select
CASE WHEN RN = 1 THEN Household ELSE '' END Household,
HouseholdMembers
from (
select h.Household,
hm.HouseholdMembers,
ROW_NUMBER()OVER(PARTITION BY hm.HouseholdMembers ORDER BY h.Household)RN from #Household h
LEFT JOIN #HouseholdMembers hm
ON hm.HouseholdMembers = h.Household)T
You didn't mention what are you using as RDBMS.
I think that you can use pivot for your case:
http://www.codeproject.com/Tips/500811/Simple-Way-To-Use-Pivot-In-SQL-Query
or to use grouping:
select c2, c3
, sum( case when no_of_days <= 7 then 1 else 0 end) as dlt8
, sum( case when no_of_days between 8 and 14 then 1 else 0 end) as d8to14
, sum( case when no_of_days between 15 and 21 then 1 else 0 end) as d15to21
, sum( case when no_of_days between 22 and 27 then 1 else 0 end) as d22to27
from mytable
group by c2, c3
order by c2, c3;
Here you can find similar answer to your question:
Dynamic alternative to pivot with CASE and GROUP BY
Edit 1
If you need something like this:
SubjectID StudentName
---------- -------------
1 Mary
1 John
1 Sam
2 Alaina
2 Edward
Result I expected was:
SubjectID StudentName
---------- -------------
1 Mary, John, Sam
2 Alaina, Edward
you can check this example:
Concatenate many rows into a single text string?
Edit 2
And the last option that I can remember is this one. It's for MySQL but you can reuse the logic:
MySQL JOIN - Return NULL for duplicate results in left table

Sorting varchar field with mixed alphanumeric data

I searched and read a lot of answers on here, but can't find one that will answer my problem, (or help me to find the answer on my own).
We have a table which contains a varchar display field, who's data is entered by the customer.
When we display the results, our customer wants the results to be ordered "correctly".
A sample of what the data could like is as follows:
"AAA 2 1 AAA"
"AAA 10 1 AAA"
"AAA 10 2 BAA"
"AAA 101 1 AAA"
"BAA 101 2 BBB"
"BAA 101 10 BBB"
"BAA 2 2 AAA"
Sorting by this column ASC returns:
1: "AAA 10 1 AAA"
2: "AAA 10 2 BAA"
3: "AAA 101 1 AAA"
4: "AAA 2 1 AAA"
5: "BAA 101 10 BBB"
6: "BAA 101 2 BBB"
7: "BAA 2 2 AAA"
The customer would like row 4 to actually be the first row (as 2 comes before 10), and similarly row 7 to be between rows 4 and 5, as shown below:
1: "AAA 2 1 AAA"
2: "AAA 10 1 AAA"
3: "AAA 10 2 BAA"
4: "AAA 101 1 AAA"
5: "BAA 2 2 AAA"
6: "BAA 101 10 BBB"
7: "BAA 101 2 BBB"
Now, the real TRICKY bit is, there is no hard and fast rule to what the data will look like in this column; it is entirely down to the customer as to what they put in here (the data shown above is just arbitrary to demonstrate the problem).
Any Help?
EDIT:
learning that this is referred to as "natural sorting" has improved my search results massively
I'm going to give the accepted answer to this question a bash and will update accordingly:
Natural (human alpha-numeric) sort in Microsoft SQL 2005
First create this function
Create FUNCTION dbo.SplitAndJoin
(
#delimited nvarchar(max),
#delimiter nvarchar(100)
) RETURNS Nvarchar(Max)
AS
BEGIN
declare #res nvarchar(max)
declare #t TABLE
(
-- Id column can be commented out, not required for sql splitting string
id int identity(1,1), -- I use this column for numbering splitted parts
val nvarchar(max)
)
declare #xml xml
set #xml = N'<root><r>' + replace(#delimited,#delimiter,'</r><r>') + '</r></root>'
insert into #t(val)
select
r.value('.','varchar(max)') as item
from #xml.nodes('//root/r') as records(r)
SELECT #res = STUFF((SELECT ' ' + case when isnumeric(val) = 1 then RIGHT('00000000'+CAST(val AS VARCHAR(8)),8) else val end
FROM #t
FOR XML PATH('')), 1, 1, '')
RETURN #Res
END
GO
This function gets an space delimited string and split it to words then join them together again by space but if the word is number it adds 8 leading zeros
then you use this query
Select * from Test
order by dbo.SplitAndJoin(col1,' ')
Live result on SQL Fiddle
Without consistency, you only have brute force
Without rules, your brute force is limited
I've made some assumptions with this code: if it starts with 3 alpha characters, then a space, then a number (up to 3 digits), let's treat it differently.
There's nothing special about this - it is just string manipulation being brute forced in to giving you "something". Hopefully it illustrates how painful this is without having consistency and rules!
DECLARE #t table (
a varchar(50)
);
INSERT INTO #t (a)
VALUES ('AAA 2 1 AAA')
, ('AAA 10 1 AAA')
, ('AAA 10 2 BAA')
, ('AAA 101 1 AAA')
, ('BAA 101 2 BBB')
, ('BAA 101 10 BBB')
, ('BAA 2 2 AAA')
, ('Completely different')
;
; WITH step1 AS (
SELECT a
, CASE WHEN a LIKE '[A-Z][A-Z][A-Z] [0-9]%' THEN 1 ELSE 0 END As fits_pattern
, CharIndex(' ', a) As first_space
FROM #t
)
, step2 AS (
SELECT *
, CharIndex(' ', a, first_space + 1) As second_space
, CASE WHEN fits_pattern = 1 THEN Left(a, 3) ELSE 'ZZZ' END As first_part
, CASE WHEN fits_pattern = 1 THEN SubString(a, first_space + 1, 1000) ELSE 'ZZZ' END As rest_of_it
FROM step1
)
, step3 AS (
SELECT *
, CASE WHEN fits_pattern = 1 THEN SubString(rest_of_it, 1, second_space - first_space - 1) ELSE 'ZZZ' END As second_part
FROM step2
)
SELECT *
, Right('000' + second_part, 3) As second_part_formatted
FROM step3
ORDER
BY first_part
, second_part_formatted
, a
;
Relevant, sorted results:
a
---------------------
AAA 2 1 AAA
AAA 10 1 AAA
AAA 10 2 BAA
AAA 101 1 AAA
BAA 2 2 AAA
BAA 101 10 BBB
BAA 101 2 BBB
Completely different
This code can be vastly improved/shortened. I've just left it verbose in order to give you some clarity over the steps taken.

SQL Server 2008 / Reporting Services query

I need some help with my recursive query to get a direct count (all members(children) directly) and total count (all team members) for my SSRS report.
Here is my current query and the result set.
WITH AgentHierarchy([Name], AId, UId, HLevel, ContractDate)
AS
(SELECT
FirstName + ' ' + LastName AS Name, AId, UId,
0 AS HLevel, ContractDate
FROM tbl_Asso
WHERE (AId ='A049')
UNION ALL
SELECT
e.FirstName + ' ' + e.LastName AS Name,
e.AId, e.UId,
eh.HLevel + 1 AS HLevel, e.ContractDate
FROM
tbl_Asso AS e
INNER JOIN
AgentHierarchy AS eh ON eh.AId = e.UId)
SELECT
AId, Name,
(select u.FirstName + ' ' + u.LastName
from tbl_Asso u
where u.AId = d.UId) as Upline,
UId,
HLevel,
ContractDate,
(Select count(*)
from tbl_Asso as dc
where dc.UId = d.AId) As DirectCount
FROM
AgentHierarchy AS d
ORDER BY
HierarchyLevel
the current result set
AId Name Upline UId HLevel ContractDate DirectCount
-----------------------------------------------------------------------
A049 King Bori Cindy Hoss A001 0 8/29/2012 5
A052 Kac Marque King Bori A049 1 11/6/2012 0
A050 Joseph Moto King Bori A049 1 10/9/2012 1
A059 Nancy Ante King Bori A049 1 3/27/2013 1
A053 Kathy May King Bori A049 1 11/15/2012 2
A057 Robert Murphy King Bori A049 1 2/12/2013 1
A051 Andy Jane Joseph Moto A050 2 2/14/2013 0
A060 Arian Colle Nancy Ante A059 2 3/26/2013 0
A058 Phil Hunk Robert Murphy A057 2 3/21/2013 0
A055 Rea Wane Kathy May A053 2 2/20/2013 1
A054 Gabby Orez Kathy May A053 2 12/7/2012 0
A056 Steve Wells Rea Wane A055 3 3/25/2013 0
I Need to change the above query to get the Direct count (all Members(children) directly) and TotalTeam count based on the contract date
E.g for e.g contract date between 03/01/2013 and 03/31/2013. I need to get the following result set.
I need to incorporate the parameter for contractDate (so that they can get the range or if it is null then they get all the records and the counts.
e.g (ContractDate between #Begindate and #Enddate) or ((#Begindate is null) and (#enddate is null))
AId Name Upline UId HLevel ContractDate DirectCount TotalTeam
---------------------------------------------------------------------------------
A049 King Bori Cindy Hoss A001 0 8/29/2012 1 4
A052 Kac Marque King Bori A049 1 11/6/2012 0 0
A050 Joseph Moto King Bori A049 1 10/9/2012 0 0
A059 Nancy Ante King Bori A049 1 3/27/2013 1 1
A053 Kathy May King Bori A049 1 11/15/2012 0 0
A057 Robert Murphy King Bori A049 1 2/12/2013 1 1
A051 Andy Jane Joseph Moto A050 2 2/14/2013 0 0
A060 Arian Colle Nancy Ante A059 2 3/26/2013 0 0
A058 Phil Hunk Robert Murphy A057 2 3/21/2013 0 0
A055 Rea Wane Kathy May A053 2 2/20/2013 1 1
A054 Gabby Orez Kathy May A053 2 12/7/2012 0 0
A056 Steve Wells Rea Wane A055 3 3/25/2013 0 0
Thanks In advance.
I am not certain but you are doing an explicit listing of a single person in your recursive CTE, this will limit scope to just that person and their parents ONLY. Unless you are doing recursion on a starting set of millions of records it should be able to handle a predicate at the bottom of the regular expression and not in the recursive CTE by itself. Provided you are handling proper max recursion level. For your contract date just leave that table out till the bottom where you link the recursion. Unless you need to get their levels first. In that case I would get that data in a first cte, then list a second one doing recurion on that.
Here is a simple example I do that includes combined sales, basically I form the recursion and am not id specific, I find max recursion(you can leave that part out if you want) to find the lowest leaf level, then I perform end predicates needed. I hope this helps. A lot of times I see people listing predicates in their recursive CTE that will limit their scope, keep in mind recursion is in essence limiting a layer over top itself n times. You can get data you need before and after that point but doing predicates in there will limit where that scope leads to.
Declare #table table ( PersonId int identity, PersonName varchar(512), Account int, ParentId int, Orders int);
insert into #Table values ('Brett', 1, NULL, 1000),('John', 1, 1, 100),('James', 1, 1, 200),('Beth', 1, 2, 300),('John2', 2, 4, 400);
select
PersonID
, PersonName
, Account
, ParentID
from #Table
; with recursion as
(
select
t1.PersonID
, t1.PersonName
, t1.Account
--, t1.ParentID
, cast(isnull(t2.PersonName, '')
+ Case when t2.PersonName is not null then '\' + t1.PersonName else t1.PersonName end
as varchar(255)) as fullheirarchy
, 1 as pos
, cast(t1.orders +
isnull(t2.orders,0) -- if the parent has no orders than zero
as int) as Orders
from #Table t1
left join #Table t2 on t1.ParentId = t2.PersonId
union all
select
t.PersonID
, t.PersonName
, t.Account
--, t.ParentID
, cast(r.fullheirarchy + '\' + t.PersonName as varchar(255))
, pos + 1 -- increases
, r.orders + t.orders
from #Table t
join recursion r on t.ParentId = r.PersonId
)
, b as
(
select *, max(pos) over(partition by PersonID) as maxrec -- I find the maximum occurrence of position by person
from recursion
)
select *
from b
where pos = maxrec -- finds the furthest down tree
-- and Account = 2 -- I could find just someone from a different department

Resources