Turning string into rows - sql-server

I have an old vintage system with a table looking like this.
OptionsTable
id options
=== ========================
101 Apple,Banana
102 Audi,Mercedes,Volkswagen
In the application that consumes the data, a function will break down the options column into manageable lists and populate dropdowns etc.
The problem is that this kind of data isn't very SQL friendly, making it difficult to make ad-hoc queries and reports.
To that end, I'd like to transform the data into a friendlier view, looking like this:
OptionsView
id name value
=== ========== =====
101 Apple 1
101 Banana 2
102 Audi 1
102 Mercedes 2
102 Volkswagen 3
Now, there have been some topics on splitting string into rows in t-sql (Turning a Comma Separated string into individual rows comes to mind), but apart from splitting the strings into rows, I also need to generate values based on the position in the string.
The plan is to make a view that hides the uglines of the original table.
It will be used in a join with the table housing the answers in order to make ad-hoc statistical queries.
Is there a good way of doing this without having to use cursors etc?

Perhaps adding a udf is overkill for your needs, but I created a split function a long time ago that returns the value, the startposition within the string and the index. With it, the usage in this scenario would be:
select id, String as [Name], ItemIndex as value from OptionsTable
outer apply dbo.Split(options, ',')
Results:
id Name value
101 Apple 1
101 Banana 2
102 Audi 1
102 Mercedes 2
102 Volkswagen 3
And the split function (unrevised since then):
ALTER function [dbo].[Split] (
#StringToSplit varchar(2048),
#Separator varchar(128))
returns table as return
with indices as
(
select 0 S, 1 E, 0 I
union all
select E, charindex(#Separator, #StringToSplit, E) + len(#Separator) , I + 1
from indices
where E > S
)
select substring(#StringToSplit,S,
case when E > len(#Separator) then e-s-len(#Separator) else len(#StringToSplit) - s + 1 end) String
,S StartIndex, I ItemIndex
from indices where S >0

This should work for you:
DECLARE #OptionsTable TABLE
(
id INT
, options VARCHAR(100)
);
INSERT INTO #OptionsTable (id, options)
VALUES (101, 'Apple,Banana')
, (102, 'Audi,Mercedes,Volkswagen');
SELECT OT.id, T.name, t.value
FROM #OptionsTable AS OT
CROSS APPLY (
SELECT T.column1, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM dbo.GetTableFromList(OT.options, ',') AS T
) AS T(name, value);
Here dbo.GetTableFromList is a split string function.
CROSS APPLY executes this function for each row resulting in options split into names in seperate rows. And I used ROW_NUMBER() to add value row, If you want to order result set by name, please use ROW_NUMBER() OVER (ORDER BY t.column1), that should and probably will make results look consistent all the time.
Result:
id name value
-----------------
101 Apple 1
101 Banana 2
102 Audi 1
102 Mercedes 2
102 Volkswagen 3

You could convert your string to XML and then parse the string to transpose it to rows something like this:
SELECT A.[id]
,Split.a.value('.', 'VARCHAR(100)') AS Name
,ROW_NUMBER() OVER (PARTITION BY [id] ORDER BY (SELECT NULL)) as Value
FROM (
SELECT [id]
,CAST('<M>' + REPLACE([options], ',', '</M><M>') + '</M>' AS XML) AS Name
FROM optionstable
) AS A
CROSS APPLY Name.nodes('/M') AS Split(a);
Credits: #SRIRAM
SQL Fiddle Demo

Related

How to repeat a record n times - SQL Server

I'm querying webdata that returns a list of items and the quantity owned. I need to translate that into multiple records - one for each item owned. For example, I might see this result: {"part_id": 118,"quantity": 3}. But in my database I need to be able to interact with each item individually, to assign them locations, properties, etc.
It would look like this:
Part_ID CopyNum
-------------------
118 1
118 2
118 3
In the past, I've kept a table I called [Count] that was just a list of integers from 1 to 100 and I did a cross join with the condition that Count.Num <= Qty
I'd like to do this without the Count table, which seems like a hack. How can I do this on the fly?
If you don't have a tally/numbers table (highly recommended), you can use an ad-hoc tally table in concert with a CROSS APPLY
Example
Declare #YourTable Table ([Part_ID] int,[Quantity] int) Insert Into #YourTable Values
(118,3)
,(125,2)
Select A.Part_ID
,CopyNum = B.N
From #YourTable A
Cross Apply ( Select Top (Quantity) N=Row_Number() Over (Order By (Select NULL))
From master..spt_values n1, master..spt_values n2
) B
Returns
Part_ID CopyNum
118 1
118 2
118 3
125 1
125 2

How do you dynamically Assing Unique Code for every row iin hierarchical table?

My table.
Table1
Id ParentId Name Code
1 Null John
2 1 Harry
3 1 Mary
4 2 Emma
5 2 Kyle
6 4 Robert
7 Null Rohit
I want to assign each individual with the the following format unique hierarchy codes
Output Required
Id ParentId Name Code
1 Null John 1
2 1 Harry 1.1
3 1 Mary 1.2
4 2 Emma 1.1.1
5 2 Kyle 1.1.2
6 4 Robert 1.1.1.1
7 Null Rohit 2
and so on.
I hope I've got this correctly...
You can use a recursive CTE together with ROW_NUMBER() in order to create your codes.
DECLARE #dummy TABLE(Id INT,ParentId INT,[Name] VARCHAR(100));
INSERT INTO #dummy(Id,ParentId,[Name]) VALUES
(1,Null,'John')
,(2,1 ,'Harry')
,(3,1 ,'Mary')
,(4,2 ,'Emma')
,(5,2 ,'Kyle')
,(6,4 ,'Robert')
,(7,Null,'Rohit');
WITH recCTE AS
(
SELECT Id,ParentId,[Name]
,CONCAT(N'.',CAST(ROW_NUMBER() OVER(ORDER BY Id) AS NVARCHAR(MAX))) AS Code
FROM #dummy WHERE ParentId IS NULL
UNION ALL
SELECT d.Id,d.ParentId,d.[Name]
,CONCAT(r.Code,N'.', ROW_NUMBER() OVER(ORDER BY d.Id))
FROM #dummy d
INNER JOIN recCTE r ON d.ParentId=r.Id
)
SELECT Id,ParentId,[Name]
,STUFF(Code,1,1,'') AS Code
FROM RecCTE;
The idea in short:
We pick the rows with ParentId IS NULL and give them a running number.
Now we go iteratively through them (it's a hidden RBAR actually) and call their children, again with a running number.
This we do until there is nothing left.
the final SELECT needs a STUFF in order to to get rid of the first dot.
And with an extension like this, you can create an alphanumerically sortable code:
WITH recCTE AS
(
SELECT Id,ParentId,[Name]
,CONCAT(N'.',CAST(ROW_NUMBER() OVER(ORDER BY Id) AS NVARCHAR(MAX))) AS Code
,CONCAT(N'000',CAST(ROW_NUMBER() OVER(ORDER BY Id) AS NVARCHAR(MAX))) AS Code2
FROM #dummy WHERE ParentId IS NULL
UNION ALL
SELECT d.Id,d.ParentId,d.[Name]
,CONCAT(r.Code,N'.', ROW_NUMBER() OVER(ORDER BY d.Id))
,CONCAT(r.Code2,RIGHT(CONCAT('0000',ROW_NUMBER() OVER(ORDER BY d.Id)),4))
FROM #dummy d
INNER JOIN recCTE r ON d.ParentId=r.Id
)
SELECT Id,ParentId,[Name]
,STUFF(Code,1,1,'') AS Code
,Code2
FROM RecCTE
ORDER BY Code2;

T-SQL getting all unique groups with their usage count

How do I find the unique groups that are present in my table, and display how often that type of group is used?
For example (SQL Server 2008R2)
So, I would like to find out how many times the combination of
PMI 100
RT 100
VT 100
is present in my table and for how many itemid's it is used;
These three form a group because together they are assigned to a single itemid. The same combination is assigned to id 2527 and 2529, so therefore this group is used at least twice. (usagecount = 2)
(and I want to know that for all types of groups that are appearing)
The entire dataset is quite large, about 5.000.000 records, so I'd like to avoid using a cursor.
The number of code/pct combinations per itemid varies between 1 and 6.
The values in the "code" field are not known up front, there are more than a dozen values on average
I tried using pivot, but I got stuck eventually and I also tried various combinations of GROUP-BY and counts.
Any bright ideas?
Example output:
code pct groupid usagecount
PMI 100 1 234
RT 100 1 234
VT 100 1 234
CD 5 2 567
PMI 100 2 567
VT 100 2 567
PMI 100 3 123
PT 100 3 123
VT 100 3 123
RT 100 4 39
VT 100 4 39
etc
Just using a simple group:
SELECT
code
, pct
, COUNT(*)
FROM myTable
GROUP BY
code
, pct
Not too sure if that's more like what you're looking for:
select
uniqueGrp
, count(*)
from (
select distinct
itemid
from myTable
) as I
cross apply (
select
cast(code as varchar(max)) + cast(pct as varchar(max)) + '_'
from myTable
where myTable.itemid = I.itemid
order by code, pct
for xml path('')
) as x(uniqueGrp)
group by uniqueGrp
Either of these should return each combination of code and percentage with a group id for the code and the total number of instances of the code against it. You can use them for also adding the number of instances of the specific code/pct combo too for determining % contribution etc
select
distinct
t.code, t.pct, v.groupcol, v.vol
from
[tablename] t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
or
select
t.code, t.pct, v.groupcol, v.vol
from
(select code, pct from [tablename] group by code, pct) t
inner join (select code, rank() over(order by count(*)) as groupcol,
count(*) as vol from [tablename] s
group by code) v on v.code=t.code
Grouping by Code, and Pct should be enough I think. See the following :
select code,pct,count(p.*)
from [table] as p
group by code,pct

T-SQL select rows by oldest date and unique category

I'm using Microsoft SQL. I have a table that contains information stored by two different categories and a date. For example:
ID Cat1 Cat2 Date/Time Data
1 1 A 11:00 456
2 1 B 11:01 789
3 1 A 11:01 123
4 2 A 11:05 987
5 2 B 11:06 654
6 1 A 11:06 321
I want to extract one line for each unique combination of Cat1 and Cat2 and I need the line with the oldest date. In the above I want ID = 1, 2, 4, and 5.
Thanks
Have a look at row_number() on MSDN.
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY date_time, id) rn
FROM mytable
) q
WHERE rn = 1
(run the code on SQL Fiddle)
Quassnoi's answer is fine, but I'm a bit uncomfortable with how it handles dups. It seems to return based on insertion order, but I'm not sure if even that can be guaranteed? (see these two fiddles for an example where the result changes based on insertion order: dup at the end, dup at the beginning)
Plus, I kinda like staying with old-school SQL when I can, so I would do it this way (see this fiddle for how it handles dups):
select *
from my_table t1
left join my_table t2
on t1.cat1 = t2.cat1
and t1.cat2 = t2.cat2
and t1.datetime > t2.datetime
where t2.datetime is null

Understanding PIVOT function in T-SQL

I am very new to SQL.
I have a table like this:
ID | TeamID | UserID | ElementID | PhaseID | Effort
-----------------------------------------------------
1 | 1 | 1 | 3 | 5 | 6.74
2 | 1 | 1 | 3 | 6 | 8.25
3 | 1 | 1 | 4 | 1 | 2.23
4 | 1 | 1 | 4 | 5 | 6.8
5 | 1 | 1 | 4 | 6 | 1.5
And I was told to get data like this
ElementID | PhaseID1 | PhaseID5 | PhaseID6
--------------------------------------------
3 | NULL | 6.74 | 8.25
4 | 2.23 | 6.8 | 1.5
I understand I need to use PIVOT function. But can't understand it clearly.
It would be great help if somebody can explain it in above case.(or any alternatives if any)
A PIVOT used to rotate the data from one column into multiple columns.
For your example here is a STATIC Pivot meaning you hard code the columns that you want to rotate:
create table temp
(
id int,
teamid int,
userid int,
elementid int,
phaseid int,
effort decimal(10, 5)
)
insert into temp values (1,1,1,3,5,6.74)
insert into temp values (2,1,1,3,6,8.25)
insert into temp values (3,1,1,4,1,2.23)
insert into temp values (4,1,1,4,5,6.8)
insert into temp values (5,1,1,4,6,1.5)
select elementid
, [1] as phaseid1
, [5] as phaseid5
, [6] as phaseid6
from
(
select elementid, phaseid, effort
from temp
) x
pivot
(
max(effort)
for phaseid in([1], [5], [6])
)p
Here is a SQL Demo with a working version.
This can also be done through a dynamic PIVOT where you create the list of columns dynamically and perform the PIVOT.
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX);
select #cols = STUFF((SELECT distinct ',' + QUOTENAME(c.phaseid)
FROM temp c
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT elementid, ' + #cols + ' from
(
select elementid, phaseid, effort
from temp
) x
pivot
(
max(effort)
for phaseid in (' + #cols + ')
) p '
execute(#query)
The results for both:
ELEMENTID PHASEID1 PHASEID5 PHASEID6
3 Null 6.74 8.25
4 2.23 6.8 1.5
These are the very basic pivot example kindly go through that.
SQL SERVER – PIVOT and UNPIVOT Table Examples
Example from above link for the product table:
SELECT PRODUCT, FRED, KATE
FROM (
SELECT CUST, PRODUCT, QTY
FROM Product) up
PIVOT (SUM(QTY) FOR CUST IN (FRED, KATE)) AS pvt
ORDER BY PRODUCT
renders:
PRODUCT FRED KATE
--------------------
BEER 24 12
MILK 3 1
SODA NULL 6
VEG NULL 5
Similar examples can be found in the blog post Pivot tables in SQL Server. A simple sample
Ive something to add here which no one mentioned.
The pivot function works great when the source has 3 columns: One for the aggregate, one to spread as columns with for, and one as a pivot for row distribution. In the product example it's QTY, CUST, PRODUCT.
However, if you have more columns in the source it will break the results into multiple rows instead of one row per pivot based on unique values per additional column (as Group By would do in a simple query).
See this example, ive added a timestamp column to the source table:
Now see its impact:
SELECT CUST, MILK
FROM Product
-- FROM (SELECT CUST, Product, QTY FROM PRODUCT) p
PIVOT (
SUM(QTY) FOR PRODUCT IN (MILK)
) AS pvt
ORDER BY CUST
In order to fix this, you can either pull a subquery as a source as everyone has done above - with only 3 columns (this is not always going to work for your scenario, imagine if you need to put a where condition for the timestamp).
Second solution is to use a group by and do a sum of the pivoted column values again.
SELECT
CUST,
sum(MILK) t_MILK
FROM Product
PIVOT (
SUM(QTY) FOR PRODUCT IN (MILK)
) AS pvt
GROUP BY CUST
ORDER BY CUST
GO
A pivot is used to convert one of the columns in your data set from rows into columns (this is typically referred to as the spreading column). In the example you have given, this means converting the PhaseID rows into a set of columns, where there is one column for each distinct value that PhaseID can contain - 1, 5 and 6 in this case.
These pivoted values are grouped via the ElementID column in the example that you have given.
Typically you also then need to provide some form of aggregation that gives you the values referenced by the intersection of the spreading value (PhaseID) and the grouping value (ElementID). Although in the example given the aggregation that will be used is unclear, but involves the Effort column.
Once this pivoting is done, the grouping and spreading columns are used to find an aggregation value. Or in your case, ElementID and PhaseIDX lookup Effort.
Using the grouping, spreading, aggregation terminology you will typically see example syntax for a pivot as:
WITH PivotData AS
(
SELECT <grouping column>
, <spreading column>
, <aggregation column>
FROM <source table>
)
SELECT <grouping column>, <distinct spreading values>
FROM PivotData
PIVOT (<aggregation function>(<aggregation column>)
FOR <spreading column> IN <distinct spreading values>));
This gives a graphical explanation of how the grouping, spreading and aggregation columns convert from the source to pivoted tables if that helps further.
SELECT <non-pivoted column>,
[first pivoted column] AS <column name>,
[second pivoted column] AS <column name>,
...
[last pivoted column] AS <column name>
FROM
(<SELECT query that produces the data>)
AS <alias for the source query>
PIVOT
(
<aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS <alias for the pivot table>
<optional ORDER BY clause>;
USE AdventureWorks2008R2 ;
GO
SELECT DaysToManufacture, AVG(StandardCost) AS AverageCost
FROM Production.Product
GROUP BY DaysToManufacture;
DaysToManufacture AverageCost
0 5.0885
1 223.88
2 359.1082
4 949.4105
-- Pivot table with one row and five columns
SELECT 'AverageCost' AS Cost_Sorted_By_Production_Days,
[0], [1], [2], [3], [4]
FROM
(SELECT DaysToManufacture, StandardCost
FROM Production.Product) AS SourceTable
PIVOT
(
AVG(StandardCost)
FOR DaysToManufacture IN ([0], [1], [2], [3], [4])
) AS PivotTable;
Here is the result set.
Cost_Sorted_By_Production_Days 0 1 2 3 4
AverageCost 5.0885 223.88 359.1082 NULL 949.4105
To set Compatibility error
use this before using pivot function
ALTER DATABASE [dbname] SET COMPATIBILITY_LEVEL = 100
FOR XML PATH might not work on Microsoft Azure Synapse Serve. A possible alternative, following #Taryn dynamic generated cols approach, same results is obtained by using STRING_AGG.
DECLARE #cols AS NVARCHAR(MAX), #query AS NVARCHAR(MAX)
SELECT #cols = STRING_AGG(QUOTENAME(c.phaseid),', ')
/*OPTIONAL: within group (order by cast(t1.[FLOW_SP_SLPM] as INT) asc)*/
FROM (SELECT phaseid FROM temp
GROUP BY phaseid) c
set #query = 'SELECT elementid,' + #cols + ' from
(
select elementid,
phaseid,
effort
from temp
) x
PIVOT
(
max(effort)
for phaseid in (' + #cols + ')
) p '
execute(#query)

Resources