Why TRY_PARSE its so slow? - sql-server

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.

The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

Related

How to optimize the insert query from multiple tables?

I have 2 tables, Table 1 (temp table in SP) has around 400 records. Table 2 has around 30,550,284 records.
I need to run a loop on table 1 for each record and get the top 1 from table 2 based on a few conditions (where clause) and then order by modified date in decreasing order.
There is an index on the modified date.
declare #iPos int;
declare #iCount int;
select #iCount = count(*) from Table1;
set #iPos = 1;
declare #Table2 table(......)
declare #timestampLocal2 datetime
while (#iPos <= #iCount)
BEGIN
select #val1 = Col1, #timestampLocal = TimeStamp
from #Table1 where ID = #iPos
set #timestampLocal2 = DATEADD(HH,-96,#timestampLocal)
INSERT INTO #Temp3 ( .... ),....)
select top 1 r.LastModified, r.[Col2], r.Col3, #iPos
from Table2 (NOLOCK) r
where Col1 =#val1 and
r.LastModified <= #timestampLocal
and r.LastModified >= #timestampLocal2
and (r.Col2 is not null and r.Col3 is not null)
order by LastModified desc
SELECT #iPos = #iPos + 1;
END
This query is very slow.
I have also thought to archive table 2, But I want to keep that as the second option for now.
Do I really need to add an index on the columns which are involved in the where clause?
So my question is, in terms of performance is there a better way to do this?
I believe a CROSS APPLY or OUTER APPLY may do the trick. These can be thought of as being similar to INNER JOIN or LEFT JOIN, except that they allow you to reference a subquery having more complex conditions such as TOP 1 and ORDER BY. Ideal for cases like this.
-- INSERT INTO #Temp3 ( .... )
select r.LastModified, r.[Col2], r.Col3, t1.ID
from #Table1 t1
cross apply (
SELECT TOP 1 r.*
from Table2 r -- Don't use (NOLOCK)
where r.Col1 = t.Col1
and r.LastModified <= t1.[TimeStamp]
and r.LastModified >= DATEADD(HH,-96,t1.[TimeStamp])
and (r.Col2 is not null and r.Col3 is not null)
order by r.LastModified desc
) r
For efficiency, I recommend an index on Table2(Col1,LastModified) or as an absolute minimum, an index on Table2(Col1).
I would strongly discourage the use of (NOLOCK) or 'READ UNCOMMITTED` in queries that update the database (like the insert into table3 above). While the query may appear to work most of the time, seemingly random occurrences of missing or duplicate rows may result.
Do you need to handle cases where no matching Table2 record is found? The above will quietly ignore such cases. Changing the CROSS APPLY to an OUTER APPLY together with logic to handle null r.xxx values could be what you need.

Is it possible to use STRING_SPLIT in an ORDER BY clause?

I am trying to order values that are going to be inserted into another table based on their order value from a secondary table. We are migrating old data to a new table with a slightly different structure.
I thought I would be able to accomplish this by using the string_split function but I'm not very skilled with SQL and so I am running into some issues.
Here is what I have:
UPDATE lse
SET Options = a.Options
FROM
dbo.LessonStepElement as lse
CROSS JOIN
(
SELECT
tbl1.*
tbl2.Options,
tbl2.QuestionId
FROM
dbo.TrainingQuestionAnswer as tbl1
JOIN (
SELECT
string_agg((CASE
WHEN tqa.CorrectAnswer = 1 THEN REPLACE(tqa.AnswerText, tqa.AnswerText, '*' + tqa.AnswerText)
ELSE tqa.AnswerText
END),
char(10)) as Options,
tq.Id as QuestionId
FROM
dbo.TrainingQuestionAnswer as tqa
INNER JOIN
dbo.TrainingQuestion as tq
on tq.Id = tqa.TrainingQuestionId
INNER JOIN
dbo.Training as t
on t.Id = tq.TrainingId
WHERE
t.IsDeleted = 0
and tq.IsDeleted = 0
and tqa.IsDeleted = 0
GROUP BY
tq.Id,
tqa.AnswerDisplayOrder
ORDER BY
(SELECT [Value] FROM STRING_SPLIT((SELECT AnswerDisplayOrder FROM dbo.TrainingQuestion WHERE Id = tmq.Id), ','))
) as tbl2
on tbl1.TrainingQuestionId = tbl2.QuestionId
) a
WHERE
a.TrainingQuestionId = lse.TrainingQuestionId
The AnswerDisplayOrder that I am using is just a nvarchar comma separated list of the ids for the answers to the question.
Here is an example:
I have 3 rows in the TrainingQuestionAnswer table that look like the following.
ID TrainingQuestionId AnswerText
-------------------------------------------
215 100 No
218 100 Yes
220 100 I'm not sure
I have 1 row in the TrainingQuestion table that looks like the following.
ID AnswerDisplayOrder
--------------------------
100 "218,215,220"
Now what I am trying to do is when I update the row in the new table with all of the answers combined, the answers will need to be in the correct order which is dependent on the AnswerDisplayOrder in the TrainingQuestion table. So in essence, the new table would have a row that would look similar to the following.
ID Options
--------------
193 "Yes No I'm not sure"
I'm aware that the way I'm trying to do it might not be even possible at all. I am still learning and would just love some advice or guidance on how to make this work. I also know that string_split does not guarantee order. I'm open to other suggestions that do guarantee the order as well.
I simplified the issue in the question to the following approach, that is a possible solution to your problem. If you want to get the results from the question, you need a splitter, that returns the substrings and the positions of the substrings. STRING_SPLIT() is available from SQL Server 2016, but is not an option here, because (as is mentioned in the documentation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
But you may try to use a JSON based approach, with a small string manipulation, that transforms answers IDs into a valid JSON array (218,215,220 into [218,215,220]). After that you can easily parse this JSON array with OPENJSON() and default schema. The result is a table, with columns key, value and type and the key column (again from the documentation) holds the index of the element in the specified array.
Tables:
CREATE TABLE TrainingQuestionId (
ID int,
TrainingQuestionId int,
AnswerText varchar(1000)
)
INSERT INTO TrainingQuestionId
(ID, TrainingQuestionId, AnswerText)
VALUES
(215, 100, 'No'),
(218, 100, 'Yes'),
(220, 100, 'I''m not sure')
CREATE TABLE TrainingQuestion (
ID int,
AnswerDisplayOrder varchar(1000)
)
INSERT INTO TrainingQuestion
(ID, AnswerDisplayOrder)
VALUES
(100, '218,215,220')
Statement:
SELECT tq.ID, oa.Options
FROM TrainingQuestion tq
OUTER APPLY (
SELECT STRING_AGG(tqi.AnswerText, ' ') WITHIN GROUP (ORDER BY CONVERT(int, j.[key])) AS Options
FROM OPENJSON(CONCAT('[', tq.AnswerDisplayOrder, ']')) j
LEFT JOIN TrainingQuestionId tqi ON TRY_CONVERT(int, j.[value]) = tqi.ID
) oa
Result:
ID Options
100 Yes No I'm not sure
Notes: You need SQL Server 2017+ to use STRING_AGG(). For SQL Server 2016 you need a FOR XML to aggregate strings.
declare #TrainingQuestionAnswer table
(
ID int,
TrainingQuestionId int,
AnswerText varchar(20)
);
insert into #TrainingQuestionAnswer(ID, TrainingQuestionId, AnswerText)
values(215, 100, 'No'), (218, 100, 'Yes'), (220, 100, 'I''m not sure');
declare #TrainingQuestiontest table
(
testid int identity,
QuestionId int,
AnswerDisplayOrder varchar(200)
);
insert into #TrainingQuestiontest(QuestionId, AnswerDisplayOrder)
values(100, '218,215,220'), (100, '220,218,215'), (100, '215,218');
select *,
(
select string_agg(pci.AnswerText, '==') WITHIN GROUP ( ORDER BY pci.pos)
from
(
select a.AnswerText,
pos = charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,','))
from #TrainingQuestionAnswer as a
where a.TrainingQuestionId = q.QuestionId
and charindex(concat(',', a.ID, ','), concat(',', q.AnswerDisplayOrder,',')) >= 1
) as pci
) as TestAnswerText
from #TrainingQuestiontest as q;

Remove string portion from inconsistent string of comma-separated values

SQL Server 2017 on Azure.
Given a field called Categories in a table called dbo.sources:
ID Categories
1 ABC01, FFG02, ERERE, CC201
2 GDF01, ABC01, GREER, DS223
3 DSF12, GREER
4 ABC01
5 NULL
What is the syntax for a query that would remove ABC01 from any record where it exists, but keep the other codes in the string?
Results would be:
ID Categories
1 AFFG02, ERERE, CC201
2 GDF01, GREER, DS223
3 DSF12, GREER
4 NULL
5 NULL
Normalising and then denormalising your data, you can do this:
USE Sandbox;
GO
CREATE TABLE dbo.Sources (ID int,
Categories varchar(MAX));
INSERT INTO dbo.Sources
VALUES (1,'ABC01,FFG02,ERERE,CC201'), --I **assume you don't really have the space)
(2,'GDF01,ABC01,GREER,DS223'),
(3,'DSF12,GREER'),
(4,'ABC01'),
(5,NULL);
GO
DECLARE #Source varchar(5) = 'ABC01'; --Value to remove
WITH CTE AS(
SELECT S.ID,
STRING_AGG(NULLIF(SS.[value],#Source),',') WITHIN GROUP(ORDER BY S.ID) AS Categories
FROM dbo.Sources S
CROSS APPLY STRING_SPLIT(S.Categories,',') SS
GROUP BY S.ID)
UPDATE S
SET Categories = C.Categories
FROM dbo.Sources S
JOIN CTE C ON S.ID = C.ID;
GO
SELECT ID,
Categories
FROM dbo.Sources
GO
DROP TABLE dbo.Sources;
Although this seems like a bit overkill, compared to the REPLACE, it shows why normalising it is a far better idea in the first place, and how simple it is to actually do so.
You can use Replace as follows:
update dbo.sources set
category = replace(replace(category,'ABC01',''),', ','')
where category like '%ABC01%'

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Select data from one table where a field is greater than that of another field in another table

I want to be able to select data from TableA where Field1 is greater than Field2 in TableB.
In my head i image it to be something like this
Select TableA.*
from TableA
Join TableB
On TableA.PK = TableB.FK
WHERE TableA.Field1 > TableB.Field2
I am using SQL server 2005 and the TableA.Field1 and tableB.Field2 look like:
2004102881010 - data type - Vrachar
My PK and FK look like:
0908232 - data type - nvarchar
The probelm is when this query is ran ALL the data is displaying and not just the rows where Field1 is greater.
Cheers:)
Seems to be working correctly for this demo code. Perhaps I'm not understanding the problem or data.
;
with TABLEA (PK, Field1) AS
(
-- Sample row that is filtered out
SELECT CAST('0908232' AS nvarchar(10)), CAST('2004102881010' AS varchar(50))
-- This is bigger than what's in B
UNION ALL SELECT CAST('0908232' AS nvarchar(10)), CAST('2005102881010' AS varchar(50))
)
, TABLEB(FK, Field2) AS
(
-- This matches row 1 above and will be excluded
SELECT CAST('0908232' AS nvarchar(10)), CAST('2004102881010' AS varchar(50))
)
SELECT TableA.*
FROM TableA
INNER JOIN TableB
ON TableA.PK = TableB.FK
WHERE TableA.Field1 > TableB.Field2
Results
PK Field1
0908232 2005102881010
This seems like a problem with missing zeroes:
20041028*0*81010
There is nothing wrong with your query, but your data.
Consider 2001-01-01 01:01:01, this would be seen as: 200111111
It should be seen as: 20010101010101
Comparrison operators (>, <) used on strings (varchars, nvarchars, etc.) work alphabetically. For example, '9' > '11' is true. You might try doing a data type conversion...
WHERE cast(A.field1 as int) > cast(B.field2 as int)

Resources