SQL, Split by 1 column and duplicate other columns - sql-server

Good day!
Maybe you can help me, or tell me if what I want to do is impossible or totally wrong...
I was trying to create a sqlfiddle but it seems the page is down at the moment.
(SQL Server 2008) I have a table, lets say that it has 3 columns, but the person who designed it didn't normalize, so one column holds multiple values, it's something like this:
IdCol Col1 Col2 Col3
1 a1 b1 a, b, c
2 a2 b2 d, e, f
As you can see, Col3 holds multiple values separated by ","
what I want to achieve, is to create a view (can't modify the table because they won't allow me to modify the application) that is something similar to this:
NewIdCol IdCol Col1 Col2 Col3
1 1 a1 b1 a
2 1 a1 b1 b
3 1 a1 b1 c
4 2 a2 b2 d
5 2 a2 b2 e
6 2 a2 b2 f
So the final result has Col3 values split into a different row and every other column's value copied. (the actual table has about 20 columns, and 2 of those columns hold multiple values, so I would need to do it for both columns)
At first I thought it would be easy... but then I hit a block on how to split that string... first I thought about using a split function, but then I didn't know how to join it back with the rest of the columns...
Thanks in advance.

You need to have a function for splitting comma-delimited strings into separate rows. Then you call the function like this:
SELECT
NewIdCol = ROW_NUMBER() OVER(ORDER BY t.IdCol, x.ItemNumber),
t.IdCol,
t.Col1,
t.Col2,
x.Item
FROM Test t
CROSS APPLY [dbo].[DelimitedSplit8K](t.Col3, ',') x
Here is the DelimitedSplit8K function by Jeff Moden.
CREATE FUNCTION [dbo].[DelimitedSplit8K](
#pString NVARCHAR(4000), #pDelimiter NCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
,cteTally(N) AS(
SELECT TOP (ISNULL(DATALENGTH(#pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
,cteStart(N1) AS(
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(#pString,t.N,1) = #pDelimiter
),
cteLen(N1,L1) AS(
SELECT
s.N1,
ISNULL(NULLIF(CHARINDEX(#pDelimiter,#pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
SELECT
ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(#pString, l.N1, l.L1)
FROM cteLen l
;

Related

T-SQL: Need Top N to always return N rows, even if null or blank

T-SQL: Need Top N to always return N rows, even if null or blank
Typically, the command
Select Top 5 * FROM ourTable
will return up to 5 rows, but less, depending whether the rows exist.
I want to ensure that it always returns 5 rows, (or in general N rows).
What is the syntax to achieve this?
The idea is to sort of generalize the LINQ concept of "FirstOrDefault" to "First_N_OrDefault", but using TSQL not LINQ.
Clearly, the 'extra' rows would have null or empty columns.
This is for Microsoft SQL Server 2014 using SSMS 14.0.17
I want to use the "TOP" syntax, if at all possible, therefore it is different than the possible duplicate. Also, as noted below, this is possibly something that could be solved at a different layer in the system, but it would be nice to have for TSQL as well.
select top (5) c1, c2, c3 from (
select top (5) c1, c2, c3, 0 as priority from ourTable
union all
select c1, c2, c3, 1 from (values (null, null, null), (null, null, null), (null, null, null), (null, null, null), (null, null, null)) v (c1, c2, c3)
) t
order by priority
You can use another dummy table with rows to generate empty rows of your table with a not matching JOIN. So you don't have to repeat the columns and rows in the UNION ALL part:
SELECT TOP 5 * FROM (
SELECT 0 AS isDummy, * FROM table_name
-- WHERE column_name = value
UNION ALL
SELECT 1 AS isDummy, t1.* FROM table_name t1
RIGHT JOIN INFORMATION_SCHEMA.COLUMNS ON t1.id = -1000 -- not valid condition so t1 columns are empty.
) t2
ORDER BY isDummy ASC
In this case the INFORMATION_SCHEMA.COLUMNS table is used to generate the additional rows. You can choose any other table with rows. You can use a TOP N value up to the count of rows in the right table (here: INFORMATION_SCHEMA.COLUMNS).
You can also generate a table with many rows (like on a calendar table):
SELECT TOP 5 * FROM (
SELECT 0 isDummy, * FROM table_name
-- WHERE column_name = value
UNION ALL
SELECT 1 isDummy, t1.* FROM table_name t1 RIGHT JOIN (
SELECT * FROM
(SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0,
(SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1,
(SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2,
(SELECT 0 t3 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t3,
(SELECT 0 t4 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t4
) t2 ON t1.id = -1000 -- not valid condition so t1 columns are empty.
)x
ORDER BY isDummy ASC
You can use a limited tally table together with a gaplessly generated row-number like here:
The SELECT is always the same. The only thing changing is the amount of rows in the mockup-table:
DECLARE #TopCount INT=5;
--Case 1: More then 5 rows in the table
DECLARE #tbl TABLE(ID INT IDENTITY,SomeValue VARCHAR(100));
INSERT INTO #tbl VALUES
('Value1'),('Value2'),('Value3'),('Value4'),('Value5'),('Value6'),('Value7');
WITH Tally(Nmbr) AS(SELECT TOP(#TopCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
,NumberedRows AS(SELECT ROW_NUMBER() OVER(ORDER BY ID) AS GeneratedRowNumber, * FROM #tbl)
SELECT *
FROM NumberedRows nr
FULL OUTER JOIN Tally t ON nr.GeneratedRowNumber=t.Nmbr;
--Case 2: Less than 5 rows in the table
DELETE FROM #tbl WHERE ID BETWEEN 2 AND 5;
WITH Tally(Nmbr) AS(SELECT TOP(#TopCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
,NumberedRows AS(SELECT ROW_NUMBER() OVER(ORDER BY ID) AS GeneratedRowNumber, * FROM #tbl)
SELECT *
FROM NumberedRows nr
FULL OUTER JOIN Tally t ON nr.GeneratedRowNumber=t.Nmbr;
--Case 3: Exactly one row in the table
DELETE FROM #tbl WHERE ID <> 6;
WITH Tally(Nmbr) AS(SELECT TOP(#TopCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
,NumberedRows AS(SELECT ROW_NUMBER() OVER(ORDER BY ID) AS GeneratedRowNumber, * FROM #tbl)
SELECT *
FROM NumberedRows nr
FULL OUTER JOIN Tally t ON nr.GeneratedRowNumber=t.Nmbr;
--Case 4: Table is empty
DELETE FROM #tbl;
WITH Tally(Nmbr) AS(SELECT TOP(#TopCount) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values)
,NumberedRows AS(SELECT ROW_NUMBER() OVER(ORDER BY ID) AS GeneratedRowNumber, * FROM #tbl)
SELECT *
FROM NumberedRows nr
FULL OUTER JOIN Tally t ON nr.GeneratedRowNumber=t.Nmbr;
This will return all rows from the source, but at least the specified count.
If you want to limit the set to exactly 5 rows (e.g. in "Case 1"), you can use SELECT TOP(#TopCount) * and place an appropriate ORDER BY. This would return the specified row count in any case.

Generate combinations in SQL Server

I need to generate combinations from the string of numbers
3,4,5,6,7 digit combinations
for example from this string
01;05;06;03;02;10;11;
here 7 numbers are there. for 3 digit 35 combinations will be there and it should be in order of order numbers in the string.
like
01;05;06;|
01;05;03;|
01;05;02;|
01;05;10;|
01;05;11;|
01;06;03;|
01;06;02;|
01;06;10;|
01;06;11;|
01;03;02;|
01;03;10;|
01;03;11;|
01;02;10;|
01;02;11;|
01;10;11;|
05;06;03;|
05;06;02;|
05;06;10;|
05;06;11;|
05;03;02;|
05;03;10;|
05;03;11;|
05;02;10;|
05;02;11;|
05;10;11;|
06;03;02;|
06;03;10;|
06;03;11;|
06;02;10;|
06;02;11;|
06;10;11;|
03;02;10;|
03;02;11;|
03;10;11;|
02;10;11;|
You can do this with two inner joins after splitting the string.
rextester: http://rextester.com/JJGKI77804
String Splitter for the test:
/* Jeff Moden's http://www.sqlservercentral.com/articles/Tally+Table/72993/ */
create function dbo.DelimitedSplitN4K (#pString nvarchar(4000), #pDelimiter nchar(1))
returns table with schemabinding as
return
with e1(n) as (
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all
select 1 union all select 1 union all select 1 union all select 1
)
, e2(n) as (select 1 from e1 a, e1 b)
, e4(n) as (select 1 from e2 a, e2 b)
, cteTally(n) as (select top (isnull(datalength(#pString)/2,0))
row_number() over (order by (select null)) from e4)
, cteStart(n1) as (select 1 union all
select t.n+1 from cteTally t where substring(#pString,t.n,1) = #pDelimiter)
, ctelen(n1,l1) as(select s.n1
, isnull(nullif(charindex(#pDelimiter,#pString,s.n1),0)-s.n1,4000)
from cteStart s
)
select Itemnumber = row_number() over(order by l.n1)
, Item = substring(#pString, l.n1, l.l1)
from ctelen l;
go
the query
declare #str nvarchar(4000)= '01;05;06;03;02;10;11;';
with cte as (
select ItemNumber, Item
from dbo.DelimitedSplitN4K(#str,';')
where Item != ''
)
select combo=a.Item+';'+b.Item+';'+c.Item
from cte as a
inner join cte as b on a.ItemNumber<b.ItemNumber
inner join cte as c on b.ItemNumber<c.ItemNumber;
order by a.ItemNumber, b.ItemNumber, c.ItemNumber
ordered by ItemNumber results:
01;05;06
01;05;03
01;05;02
01;05;10
01;05;11
01;06;03
01;06;02
01;06;10
01;06;11
01;03;02
01;03;10
01;03;11
01;02;10
01;02;11
01;10;11
05;06;03
05;06;02
05;06;10
05;06;11
05;03;02
05;03;10
05;03;11
05;02;10
05;02;11
05;10;11
06;03;02
06;03;10
06;03;11
06;02;10
06;02;11
06;10;11
03;02;10
03;02;11
03;10;11
02;10;11
If you want to return a single string, pipe delimited then:
with cte as (
select ItemNumber, Item
from dbo.DelimitedSplitN4K(#str,';')
where Item != ''
)
select combo=stuff(
(select '|'+a.Item+';'+b.Item+';'+c.Item
from cte as a
inner join cte as b on a.ItemNumber<b.ItemNumber
inner join cte as c on b.ItemNumber<c.ItemNumber
order by a.ItemNumber, b.ItemNumber, c.ItemNumber
for xml path (''), type).value('.','nvarchar(max)')
,1,1,'')
results:
01;05;06|01;05;03|01;05;02|01;05;10|01;05;11|01;06;03|01;06;02|01;06;10|01;06;11|01;03;02|01;03;10|01;03;11|01;02;10|01;02;11|01;10;11|05;06;03|05;06;02|05;06;10|05;06;11|05;03;02|05;03;10|05;03;11|05;02;10|05;02;11|05;10;11|06;03;02|06;03;10|06;03;11|06;02;10|06;02;11|06;10;11|03;02;10|03;02;11|03;10;11|02;10;11
splitting strings reference:
Tally OH! An Improved SQL 8K “CSV Splitter” Function
Splitting Strings : A Follow-Up - Aaron Bertrand
Split strings the right way – or the next best way
I had nearly the same query but resulted somehow different
Please check
/*
create table Combination (id char(2))
insert into Combination values ('01'),('05'),('06'),('03'),('02'),('10'),('11')
*/
select c1.id, c2.id, c3.id, c1.id + ';' + c2.id + ';' + c3.id Combination
from Combination c1, Combination c2, Combination c3
where
c2.id between c1.id and c3.id
and c1.id <> c2.id
and c2.id <> c3.id
order by c1.id, c2.id, c3.id
The output is

Get a max record for each unique column value in a table

I have a database table like this
A || B || C
------------------------------------------
1 ABC 10
1 XYZ 5
2 EFG 100
2 LMN 150
2 WER 50
3 ABC 50
3 XYZ 75
Now i want to have a result set like this,where i want to have the max value of column C for each value in column A
A || B || C
-----------------------------------------
1 ABC 10
2 LMN 150
3 XYZ 75
I have tried using distinct and max() but it did not work. like this
select distinct #table.A,#table.B,MAX(#table.C) from #table group by #table.A,#table.B
Is there a simple way to achieve this?
Using MAX() as a window function:
SELECT t.A, t.B, t.C
FROM
(
SELECT A, B, C, MAX(C) OVER (PARTITION BY A) max_C
FROM yourTable
) t
WHERE t.C = t.max_C
If you want to retrieve only a single max record for each group of A values, then you should use the method suggested by #GurV, which is the row number:
SELECT t.A, t.B, t.C
FROM
(
SELECT A, B, C, ROW_NUMBER() OVER (PARTITION BY A ORDER BY C, B DESC) row_num
FROM yourTable
) t
WHERE t.row_num = 1
Note carefully the ORDER BY C, B inside the call to ROW_NUMBER(). This will place max C records at the top of each partition, and will then also order descending by B values. Only one value will be retained though.
If you order by both C and B the combination of both may or may not give you the highest value of Column C. So I feel the below query should work for your specific requirement.
SELECT table.A, table.B, table.C
FROM
(
SELECT A, B, C, ROW_NUMBER() OVER (PARTITION BY A ORDER BY C DESC) row_num
FROM yourTable
) table
WHERE table.row_num = 1
You can use window function to do this:
select * from (select
t.*,
row_number() over (partition by A order by C desc) rn
from your_table t) t where rn = 1;
If those aren't supported, use JOIN:
select t1.*
from your_table t1
inner join (
select A, max(C) C
from your_table
group by A
) t2 on t1.A = t2.A
and t1.C = t2.C;
Just an another way with a simple Join and Group BY
Schema:
SELECT * INTO #TAB1 FROM (
SELECT 1 A, 'ABC' B , 10 C
UNION ALL
SELECT 1 , 'XYZ' , 5
UNION ALL
SELECT 2 , 'EFG' , 100
UNION ALL
SELECT 2 , 'LMN' , 150
UNION ALL
SELECT 2 , 'WER' , 50
UNION ALL
SELECT 3 , 'ABC' , 50
UNION ALL
SELECT 3 , 'XYZ' , 75
)A
Do join to sub query
SELECT C2.A,C1.B, C2.MC
FROM #TAB1 C1
INNER JOIN
(
SELECT A, MAX(C) MC
FROM #TAB1
GROUP BY A
)AS C2 ON C1.A=C2.A AND C1.C= C2.MC
And the result will be
+---+-----+-----+
| A | B | MC |
+---+-----+-----+
| 1 | ABC | 10 |
| 2 | LMN | 150 |
| 3 | XYZ | 75 |
+---+-----+-----+

INSERT INTO slows down table-valued FUNCTION

On the SQL Server 2008 side I have the table-valued function, that receives 45k of integer ids merged into single VARBINARY(MAX), splits them and returns back as a table. SplitIds takes up to 5s. As I see in the estimated execution plan - 100% is 'Table Insert'. Is it possible somehow to speed up this function?
ALTER FUNCTION [dbo].[SplitIds](#data VARBINARY(MAX))
RETURNS #result TABLE(Id INT NOT NULL)
AS
BEGIN
IF #data IS NULL
RETURN
DECLARE #ptr INT = 0, #size INT = 4
WHILE #ptr * #size < LEN(#data)
BEGIN
INSERT INTO #result(Id)
VALUES(SUBSTRING(#data, #ptr * #size + 1, #size))
SET #ptr += 1
END
RETURN
END
Currently on the C# side it is used in Linq-to-SQL query in the next way:
XDbOrder[] orders =
database.SplitIds(ConvertToVarbinary(orderIds))
Join(
database.Get<XDbOrder>,
r = r.Id,
o => o.Id,
(r, o) => o).
ToArray();
More general question: is it possible somehow in Linq-to-SQL to implement the next thing without SplitIds? .Contains does not work - it creates the query with more than 2100 SQL parameters and crashes.
int[] orderIds = { ... 45k random entries .....};
XDbOrder[] orders =
database.Get<XDbOrder>().
Where(o => orderIds.Contains(o.Id)).
ToArray();
You could try a more set based approach.
(I've kept the multi statement TVF approach because the inline approach to generating a table of numbers works well in isolation but the execution plans when incorporated into a larger query can be quite catastrophically bad - this ensures that the split happens once and only once)
I've also added a Primary Key to the return table so it contains a useful index.
CREATE FUNCTION [dbo].[SplitIds](#data VARBINARY(MAX))
RETURNS #result TABLE(Id INT NOT NULL PRIMARY KEY WITH (IGNORE_DUP_KEY=ON))
AS
BEGIN
IF #data IS NULL
RETURN
DECLARE #size INT = 4;
WITH E1(N)
AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1), -- 1*10^1 or 10 rows
E2(N)
AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows
E4(N)
AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows
E8(N)
AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^8 or 100,000,000 rows
Nums(N)
AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1
FROM E8)
INSERT INTO #result
(Id)
SELECT TOP (DATALENGTH(#data)/#size) SUBSTRING(#data, N * #size + 1, #size)
FROM Nums
RETURN
END
The following completes in about 160ms for me
DECLARE #data VARBINARY(MAX) = 0x
WHILE DATALENGTH(#data) < 184000
SET #data = #data + CRYPT_GEN_RANDOM(8000)
SELECT COUNT(*)
FROM [dbo].[SplitIds](#data)
Here is my version of set based approach
create FUNCTION [dbo].[SplitIds1](#data VARBINARY(MAX))
returns table with SCHEMABINDING
as
return
WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2), -- 10*100
e4(n) AS (SELECT 1 FROM e3 A CROSS JOIN e3 B), -- 1000*1000
Numbers(ptr,Size) AS (SELECT ROW_NUMBER() OVER (ORDER BY n)-1,4 FROM e4)
SELECT SUBSTRING(#data, ptr * Size + 1, Size) as Id
FROM Numbers
WHERE ptr * Size < LEN(#data)
Few notes about my approach
Adding SCHEMABINDING to the function will avoid unnecessary Table
spool operator in execution plan
Also removed #size variable since it is hard coded inside function
Changed Multi-Statement Table-Valued Function to Inline Table-Valued Function which allows you to see execution plan of select statement inside function just like any view or select query

Insert row for each integer between 0 and <value> without cursor

I have a source table with id and count.
id count
a 5
b 2
c 31
I need to populate a destination table with each integer up to the count for each id.
id value
a 1
a 2
a 3
a 4
a 5
b 1
b 2
c 1
c 2
etc...
My current solution is like so:
INSERT INTO destination (id,value)
source.id
sequence.number
FROM
(VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9)) AS sequence(number)
INNER JOIN
source ON sequence.number <= source.count
This solution has an upper limit and is plain lame. Is there anyway to replace the sequence with a set of all integers? Or another solution that does not use looping.
this should work:
WITH r AS (
SELECT id, count, 1 AS n FROM SourceTable
UNION ALL
SELECT id, count, n+1 FROM r WHERE n<count
)
SELECT id,n FROM r
order by id,n
OPTION (MAXRECURSION 0)
Unfortunately, there is not set of all integers in SQL Server. However, using a little trickery, you can easily generate such a set:
select N from (
select ROW_NUMBER() OVER (ORDER BY t1.object_id) AS N
from sys.all_objects t1, sys.all_objects t2
) AS numbers
where N between 1 and 1000000
will generate a set of all numbers from 1 through 1000000. If you need more than a few million numbers, add sys.all_objects to the cross join a third time.
You can find many examples in this page:
DECLARE #table TABLE (ID VARCHAR(1), counter INT)
INSERT INTO #table SELECT 'a', 5
INSERT INTO #table SELECT 'b', 3
INSERT INTO #table SELECT 'c', 31
;WITH cte (ID, counter) AS (
SELECT id, 1
FROM #table
UNION ALL
SELECT c.id, c.counter +1
FROM cte AS c
INNER JOIN #table AS t
ON t.id = c.id
WHERE c.counter + 1 <= t.counter
)
SELECT *
FROM cte
ORDER BY ID, Counter

Resources