Is it possible to use STRING_SPLIT in an ORDER BY clause? - sql-server

I am trying to order values that are going to be inserted into another table based on their order value from a secondary table. We are migrating old data to a new table with a slightly different structure.
I thought I would be able to accomplish this by using the string_split function but I'm not very skilled with SQL and so I am running into some issues.
Here is what I have:
UPDATE lse
SET Options = a.Options
FROM
dbo.LessonStepElement as lse
CROSS JOIN
(
SELECT
tbl1.*
tbl2.Options,
tbl2.QuestionId
FROM
dbo.TrainingQuestionAnswer as tbl1
JOIN (
SELECT
string_agg((CASE
WHEN tqa.CorrectAnswer = 1 THEN REPLACE(tqa.AnswerText, tqa.AnswerText, '*' + tqa.AnswerText)
ELSE tqa.AnswerText
END),
char(10)) as Options,
tq.Id as QuestionId
FROM
dbo.TrainingQuestionAnswer as tqa
INNER JOIN
dbo.TrainingQuestion as tq
on tq.Id = tqa.TrainingQuestionId
INNER JOIN
dbo.Training as t
on t.Id = tq.TrainingId
WHERE
t.IsDeleted = 0
and tq.IsDeleted = 0
and tqa.IsDeleted = 0
GROUP BY
tq.Id,
tqa.AnswerDisplayOrder
ORDER BY
(SELECT [Value] FROM STRING_SPLIT((SELECT AnswerDisplayOrder FROM dbo.TrainingQuestion WHERE Id = tmq.Id), ','))
) as tbl2
on tbl1.TrainingQuestionId = tbl2.QuestionId
) a
WHERE
a.TrainingQuestionId = lse.TrainingQuestionId
The AnswerDisplayOrder that I am using is just a nvarchar comma separated list of the ids for the answers to the question.
Here is an example:
I have 3 rows in the TrainingQuestionAnswer table that look like the following.
ID TrainingQuestionId AnswerText
-------------------------------------------
215 100 No
218 100 Yes
220 100 I'm not sure
I have 1 row in the TrainingQuestion table that looks like the following.
ID AnswerDisplayOrder
--------------------------
100 "218,215,220"
Now what I am trying to do is when I update the row in the new table with all of the answers combined, the answers will need to be in the correct order which is dependent on the AnswerDisplayOrder in the TrainingQuestion table. So in essence, the new table would have a row that would look similar to the following.
ID Options
--------------
193 "Yes No I'm not sure"
I'm aware that the way I'm trying to do it might not be even possible at all. I am still learning and would just love some advice or guidance on how to make this work. I also know that string_split does not guarantee order. I'm open to other suggestions that do guarantee the order as well.

I simplified the issue in the question to the following approach, that is a possible solution to your problem. If you want to get the results from the question, you need a splitter, that returns the substrings and the positions of the substrings. STRING_SPLIT() is available from SQL Server 2016, but is not an option here, because (as is mentioned in the documentation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try to use a JSON based approach, with a small string manipulation, that transforms answers IDs into a valid JSON array (218,215,220 into [218,215,220]). After that you can easily parse this JSON array with OPENJSON() and default schema. The result is a table, with columns key, value and type and the key column (again from the documentation) holds the index of the element in the specified array.
Tables:
CREATE TABLE TrainingQuestionId (
ID int,
TrainingQuestionId int,
AnswerText varchar(1000)
)
INSERT INTO TrainingQuestionId
(ID, TrainingQuestionId, AnswerText)
VALUES
(215, 100, 'No'),
(218, 100, 'Yes'),
(220, 100, 'I''m not sure')
CREATE TABLE TrainingQuestion (
ID int,
AnswerDisplayOrder varchar(1000)
)
INSERT INTO TrainingQuestion
(ID, AnswerDisplayOrder)
VALUES
(100, '218,215,220')
Statement:
SELECT tq.ID, oa.Options
FROM TrainingQuestion tq
OUTER APPLY (
SELECT STRING_AGG(tqi.AnswerText, ' ') WITHIN GROUP (ORDER BY CONVERT(int, j.[key])) AS Options
FROM OPENJSON(CONCAT('[', tq.AnswerDisplayOrder, ']')) j
LEFT JOIN TrainingQuestionId tqi ON TRY_CONVERT(int, j.[value]) = tqi.ID
) oa
Result:
ID Options
100 Yes No I'm not sure
Notes: You need SQL Server 2017+ to use STRING_AGG(). For SQL Server 2016 you need a FOR XML to aggregate strings.

declare #TrainingQuestionAnswer table
(
ID int,
TrainingQuestionId int,
AnswerText varchar(20)
);
insert into #TrainingQuestionAnswer(ID, TrainingQuestionId, AnswerText)
values(215, 100, 'No'), (218, 100, 'Yes'), (220, 100, 'I''m not sure');
declare #TrainingQuestiontest table
(
testid int identity,
QuestionId int,
AnswerDisplayOrder varchar(200)
);
insert into #TrainingQuestiontest(QuestionId, AnswerDisplayOrder)
values(100, '218,215,220'), (100, '220,218,215'), (100, '215,218');
select *,
(
select string_agg(pci.AnswerText, '==') WITHIN GROUP ( ORDER BY pci.pos)
from
(
select a.AnswerText,
pos = charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,','))
from #TrainingQuestionAnswer as a
where a.TrainingQuestionId = q.QuestionId
and charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,',')) >= 1
) as pci
) as TestAnswerText
from #TrainingQuestiontest as q;

Related

Why TRY_PARSE its so slow?

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.
The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

How to Join 2 Tables in SQL Server on a Field having ForeName and Surnames in unknown order

Wondering if you could help me with something. We get agency invoices for workers who worked for different clients on our behalf. Now these Invoices have worker's surname and Forenames in completely random order. Now my SSIS package import these invoices to a database. I am trying to automate few reports based on this data. Now i would like to Join this Invoice data with our data warehouse Employee and Shift table. Only fields i can join this on is Name and Shift date. How can i Join the DW Employee table to Imported Invoices as Names in Invoice tables are all over the place. Any nice SQL function that can help ,e. I have tried Joins using Concat and Like but it didn't work. Suggestions would be highly appreciated.
Thanks and Regards
Khurram
This is a real challenge for you. Others have already suggested that you might not want to do this, but I will give it a shot anyway. I think you need to create two parts, the first part and the second part, in both tables. Then you can compare them. The following code might lead you in the right direction:
SELECT LEFT ('John Jackson', CHARINDEX (' ', 'John Jackson') - 1) AS FirstPart,
SUBSTRING ('John Jackson', CHARINDEX (' ', 'John Jackson'), LEN ('John Jackson') - CHARINDEX (' ', 'John Jackson') + 1) AS SecondPart
SELECT *
FROM TableA AS A
LEFT JOIN TableB AS B ON A.FirstPart = B.FirstPart AND A.SecondPart = B.SecondPart
UNION ALL
SELECT * FROM TableA
LEFT JOIN TableB AS B2 ON A.FirstPart = B2.SecondPart AND A.SecondPart = B2.FirstPart
Good luck in handling this!
As has already been said, this is a challenge with no good answer outside of a redesign. But, like SQL_M, I figured I'd try it. Assuming that dwEmployee is the source of record, I initially went with something similar to SQL_M's approach with this:
SELECT *
FROM dbo.invoiceTable i
JOIN dwEmployee d
ON
(
d.[DW_personname] LIKE CONCAT('%',SUBSTRING(i.[Name],0,CHARINDEX(' ',i.[name])),' %')
OR d.[DW_personname] LIKE CONCAT('% ',SUBSTRING(i.[Name],0,CHARINDEX(' ',i.[name])),'%')
)
AND
(
d.[DW_personname] LIKE CONCAT('%',SUBSTRING(i.[Name],CHARINDEX(' ',i.[name])+1,LEN(I.[Name]) - CHARINDEX(' ',i.[name])),' %')
OR d.[DW_personname] LIKE CONCAT('% ',SUBSTRING(i.[Name],CHARINDEX(' ',i.[name])+1,LEN(I.[Name]) - CHARINDEX(' ',i.[name])),'%')
)
It works great until it hits a name that has three distinct parts... and then I couldn't be buggered to get it working - I'm leaving it here in case it helps you build on something.
I ended up using a CTE and STRING_SPLIT combination to make potential matches across the tables. You had mentioned that you can join on Shiftdate, but didn't say exactly how you wanted the results to look, and it seemed the name join was the big issue, so I just focused on that. Depending on what version of SQL you're on STRING_SPLIT might not be available to you, in which case you'd have to use a different split function to make this method work. (demo here http://sqlfiddle.com/#!18/4bd31/2/1 )
CREATE TABLE invoiceTable
(
[Invoice_ID] INT, [ShiftDate] DATE, [Ref_Num] INT, [Name] VARCHAR(200)
)
CREATE TABLE dwEmployee
(
[Shiftdate] DATE, [DW_personname] VARCHAR(200), [Timesheetserial] VARCHAR(200)
)
INSERT INTO dbo.invoiceTable
VALUES
(807, '2018-09-02',83789315,'ABCD EFGH'), (195, '2018-09-14',83789315,'EFGH ABCD'), (227, '2018-09-15',83789315,'WXYZ EFGH-ABCD'), (246, '2018-09-16',83789315,'JKLM OPQR'),(1398, '2018-09-19',83789315,'STUV IJKKL WXYZ')
INSERT INTO dbo.dwEmployee
VALUES
( '2018-10-22','EFGH ABCD','Z3746543'), ( '2018-10-29','EFIH ABCD','Z3746550'), ( '2018-10-26','EFGH-ABCD WXYZ','Z3746557'),( '2018-10-26','EFGH-ABCD WXYZ','Z3746557')
--my additional insert for testing three-part name
INSERT INTO dbo.dwEmployee
VALUES
( '2018-10-31','WXYZ STUV IJKKL','Z0000000');
--work
WITH nameSplitter AS
(
SELECT
[Invoice_ID], CAST(NULL AS VARCHAR(200)) AS [Timesheetserial], [Value]
FROM invoiceTable
CROSS APPLY STRING_SPLIT([Name], ' ')
UNION ALL
SELECT
NULL, [Timesheetserial], Value
FROM dwEmployee
CROSS APPLY STRING_SPLIT([DW_personname], ' ')
),
potentialMatches AS
(
SELECT
ns1.[Invoice_ID], ns2.[Timesheetserial]
FROM nameSplitter ns1
JOIN nameSplitter ns2
ON ns2.value = ns1.value
WHERE ns1.[Invoice_ID] IS NOT NULL
AND ns2.[Timesheetserial] IS NOT NULL
GROUP BY ns1.[Invoice_ID], ns2.[Timesheetserial]
HAVING COUNT(ns2.[Timesheetserial]) = (SELECT COUNT([Timesheetserial]) FROM nameSplitter WHERE [Timesheetserial] = ns2.[Timesheetserial] )
)
SELECT i.*, d.*
FROM potentialMatches p
join dbo.invoiceTable i
ON P.[Invoice_ID] = I.[Invoice_ID]
JOIN dwEmployee d
ON p.[Timesheetserial] = d.[Timesheetserial]
If you only got one forename and one surname, you could try something like this:
....
FROM tableA AS A
INNER JOIN tableB AS B
ON A.Name = B.DW_personname OR A.Name = RIGHT(B.DW_personname, LEN(DW_personname) - CHARINDEX(' ', B.DW_personname) +1) + ' ' + LEFT(B.DW_personname, CHARINDEX(' ', B.DW_personname))

ROW_NUMBER in cross apply generating "incorrect" values based on exists clause

Here is the sql:
-- Schema
DECLARE #ModelItem TABLE (
ModelItemId UNIQUEIDENTIFIER,
MetamodelItemId UNIQUEIDENTIFIER
)
DECLARE #MetamodelItemAncestor TABLE (
MetamodelItemId UNIQUEIDENTIFIER,
ParentMetamodelItemId UNIQUEIDENTIFIER,
AncestorLevel INT
)
DECLARE #SolutionMetamodelItem TABLE (
MetamodelItemId UNIQUEIDENTIFIER,
SolutionId UNIQUEIDENTIFIER
)
INSERT INTO #ModelItem VALUES ('EC6AC6A9-684E-E611-8117-00155D026308', '2AB1F075-684E-E611-8117-00155D026308')
INSERT INTO #MetamodelItemAncestor
VALUES ('2AB1F075-684E-E611-8117-00155D026308', '2AB1F075-684E-E611-8117-00155D026308', 0),
('2AB1F075-684E-E611-8117-00155D026308', 'AA12E380-CA4D-E611-8117-00155D026308', 1)
INSERT INTO #SolutionMetamodelItem
VALUES ('2AB1F075-684E-E611-8117-00155D026308', 'f612a333-ca4d-e611-8117-00155d026308'),
('AA12E380-CA4D-E611-8117-00155D026308', 'fc160f3e-ca4d-e611-8117-00155d026308')
-- query
DECLARE #ModelItemId TABLE (EntityId UNIQUEIDENTIFIER)
DECLARE #SolutionId TABLE (EntityId UNIQUEIDENTIFIER)
INSERT INTO #ModelItemId
VALUES ('EC6AC6A9-684E-E611-8117-00155D026308')
INSERT INTO #SolutionId
VALUES ('f612a333-ca4d-e611-8117-00155d026308'), ('fc160f3e-ca4d-e611-8117-00155d026308')
SELECT mia.*
FROM (
SELECT M.EntityId AS ModelItemId, S.EntityId AS SolutionId
FROM #ModelItemId AS M
CROSS JOIN #SolutionId AS S
) AS m
CROSS APPLY (
SELECT
MI.ModelItemId,
OTA.ParentMetamodelItemId AS [MetamodelItemId],
ROW_NUMBER() OVER (PARTITION BY [MI].[ModelItemId] ORDER BY [OTA].[AncestorLevel] ASC) AS [AspectRank]
FROM #ModelItem AS MI
INNER JOIN #MetamodelItemAncestor AS OTA
ON MI.MetamodelItemId = OTA.MetamodelItemId
WHERE
MI.ModelItemId = m.ModelItemId
AND EXISTS (
SELECT 1
FROM #SolutionMetamodelItem AS MSMI
WHERE MSMI.MetamodelItemId = OTA.ParentMetamodelItemId
AND MSMI.SolutionId = m.SolutionId
)
) mia
SELECT mia.*
FROM #ModelItemId AS m
CROSS APPLY (
SELECT
MI.ModelItemId,
OTA.ParentMetamodelItemId AS [MetamodelItemId],
ROW_NUMBER() OVER (PARTITION BY [MI].[ModelItemId] ORDER BY [OTA].[AncestorLevel] ASC) AS [AspectRank]
FROM #ModelItem as MI
INNER JOIN #MetamodelItemAncestor AS OTA
ON MI.MetamodelItemId = OTA.MetamodelItemId
WHERE
MI.ModelItemId = m.EntityId
AND EXISTS (
SELECT 1
FROM #SolutionMetamodelItem MSMI
WHERE MSMI.MetamodelItemId = OTA.ParentMetamodelItemId
AND MSMI.SolutionId IN (SELECT s.EntityId FROM #SolutionId AS s)
)
) mia
Notice the AspectRank. In the second query it has correctly increased the value sequentially based on the partition.
Looking at the execution plan, for the first query it seems like the row_number (sequence project) is running concurrently to the scan of the #solution table, but I still am not fully sure why it has not increased the row number value since there a duplicate items.
Could someone explain this? I need to use the first approach because the cross apply query is in fact a UDF with the ModelItemId and SolutionId as parameters.
I would assume the cross apply is executed separately for each of the rows in your outer query -> each of the rows returned is the 1st (and only) row.
Why do you need to have the row number inside the cross apply, instead of being in the outer query, if that's actually where your data is?

How to check for a specific condition by looping through every record in SQL Server?

I do have following table
ID Name
1 Jagan Mohan Reddy868
2 Jagan Mohan Reddy869
3 Jagan Mohan Reddy
Name column size is VARCHAR(55).
Now for some other task we need to take only 10 varchar length i.e. VARCHAR(10).
My requirement is to check that after taking the only 10 bits length of Name column value for eg if i take Name value of ID 1 i.e. Jagan Mohan Reddy868 by SUBSTRING(Name, 0,11) if it equals with another row value. here in this case the final value of SUBSTRING(Jagan Mohan Reddy868, 0,11) is equal to Name value of ID 3 row whose Name is 'Jagan Mohan Reddy'. I need to make a list of those kind rows. Can somebody help me out on how can i achieve in SQL Server.
My main check is that the truncated values of my Name column should not match with any non truncated values of Name column. If so i need to get those records.
Assuming I understand the question, I think you are looking for something like this:
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(15)
)
INSERT INTO #T VALUES
('Hi, I am Zohar.'),
('Hi, I am Peled.'),
('Hi, I am Z'),
('I''m Zohar peled')
Use a cte with a self inner join to get the list of ids that match the first 10 chars:
;WITH cte as
(
SELECT T2.Id As Id1, T1.Id As Id2
FROM #T T1
INNER JOIN #T T2 ON LEFT(T1.Name, 10) = t2.Name AND T1.Id <> T2.Id
)
Select the records from the original table, inner joined with a union of the Id1 and Id2 from the cte:
SELECT T.Id, Name
FROM #T T
INNER JOIN
(
SELECT Id1 As Id
FROM CTE
UNION
SELECT Id2
FROM CTE
) U ON T.Id = U.Id
Results:
Id Name
----------- ---------------
1 Hi, I am Zohar.
3 Hi, I am Z
Try this
SELECT Id,Name
FROM(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Name, LEFT(Name,11) ORDER BY ID) RN
FROM Tbale1 T
) Tmp
WHERE Tmp.RN = 1
loop over your column for all the values and put your substring() function inside this loop and I think in Sql index of string starts from 1 instead of 0. If you pass your string to charindex() like this
CHARINDEX('Y', 'Your String')
thus you will come to know whether it is starting from 0 or 1
and you can save your substring value as value of other column with length 10
I hope it will help you..
I think this should cover all the cases you are looking for.
-- Create Table
DECLARE #T as TABLE
(
Id int identity(1,1),
Name varchar(55)
)
-- Create Data
INSERT INTO #T VALUES
('Jagan Mohan Reddy868'),
('Jagan Mohan Reddy869'),
('Jagan Mohan Reddy'),
('Mohan Reddy'),
('Mohan Reddy123551'),
('Mohan R')
-- Get Matching Items
select *, SUBSTRING(name, 0, 11) as ShorterName
from #T
where SUBSTRING(name, 0, 11) in
(
-- get all shortnames with a count > 1
select SUBSTRING(name, 0, 11) as ShortName
from #T
group by SUBSTRING(name, 0, 11)
having COUNT(*) > 1
)
order by Name, LEN(Name)

Concat and loop functions

I have this problem:
For each plan that start with M and A, I need to add 3 additional chars to it and keep the original. Also, plans that start with T needs to stay the same:
For example:
M_VA_K15CVA
M_VA_M20CVA
M_VA_T234
should be
M_VA_K15CVA
M_VA_K15CVA_V1
M_VA_K15CVA_V2
M_VA_K15CVA_V3
M_VA_M20CVA
M_VA_M20CVA_V1
M_VA_M20CVA_V2
M_VA_M20CVA_V3
M_VA_TNT10-VA
Any hints on what should I use to do this file? Thank you
You can expand a list into a larger list by JOINing to a derived table with a lose JOIN condition. Here is code which meets your needs, assuming that you meant "M" and "K" in your question, not "M" and "A", which didn't seem to match your example.
DECLARE #Plans TABLE
(
PlanID varchar(50)
);
INSERT INTO #Plans (PlanID) VALUES ('M_VA_K15CVA');
INSERT INTO #Plans (PlanID) VALUES ('M_VA_M20CVA');
INSERT INTO #Plans (PlanID) VALUES ('M_VA_T234');
SELECT
CASE
WHEN RowValue IS NULL OR RowValue = 0 THEN
PlanID
ELSE
PlanID + '_V' + CONVERT(varchar,RowValue)
END AS NewPlanID
FROM #Plans
LEFT JOIN (
SELECT 0 AS RowValue
UNION ALL SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
) AS RowExpander
ON RIGHT(LEFT(PlanID,6),1) IN ('M','K');

Resources