Issue with the cross apply statement and from - database

i'm triying to make something but i'm stuck on an issue. The problem is that i need to make a stored procedure where if a param of the procedure is the actual date or older, 2 params in a query and a part of the where would be ignored . I use initially an if else to ignore the cases but the code is repeated and inconsistent to maintain. So i need to make the same thing but without the if /else statement. any help would be apreciated.
select
Cats.*,
Zone.[D] as [De], -- this part would be avoided if #Min and #Max are null
Zone.[R] as [Re] -- this part would be avoided if #Min and #Max are null
from
flv
CROSS APPLY --all the cross apply would be avoided if #Min and #Max are null
(
select
x
from
XXXX
where
XXXX.IDx= 1
) as Data
where
---where clause
I just want to avoid the cross apply in case that the 2 params are null

Change XXXX.IDx= 1 into #Min IS NOT NULL AND #Max IS NOT NULL AND XXXX.IDx= 1 and your cross apply won't hurt a bit. However it would mean no rows will be returned from the CROSS APPLY and thus the whole query. So you'll als need the CROSS APPLY into an OUTER APPLY
For example the following script:
DECLARE #test INT
SELECT * FROM dbo.tmpTable AS TT
OUTER APPLY (
SELECT * FROM dbo.test AS T
WHERE #test IS NOT NULL
) AS T
Gives the following execution plan, pay special attention the the Filter component here.
When you hover the filter component you may notice the following
Meaning dbo.test in this case would only be scanned if the startup predicate is met. Put OPTION (RECOMPILE) on you query and look at the actual query plan and the table is even not even in the query plan if #test is null.

Related

Set where clause basis parameter value in SQL Server

I wish to set a condition while making use of the WHERE clause basis #USER_ID input recorded from the user in my SQL server query, but I guess I am missing something essential here.
DECLARE #USER_ID NVARCHAR(255) = 'a,b,c' -- will be multi-select parameter; can be null as well
select * from <table_name>
{
if: #USER_ID is NULL then no WHERE condition
else: WHERE <table_name>.<col_name> in (SELECT item FROM SplitString(#USER_ID,','))
}
Can someone please help here?
Personally, I would suggest switching to a table type parameter, then you don't need to use STRING_SPLIT at all. I'm not going to cover the creation of said type of parameter here though; there's plenty already out there and the documentation is more than enough to explain it too.
As for the problem, if we were to be using STRING_SPLIT you would use a method like below:
SELECT {Your Columns} --Don't use *
FROM dbo.YourTable YT
WHERE YT.YourColumn IN (SELECT SS.Value
FROM STRING_SPLIT(#USER_ID,',') SS)
OR #USER_ID IS NULL
OPTION (RECOMPILE);
The RECOMPILE in the OPTION clause is important, as a query where #USER_ID has the value NULL is likely to be very different to that where it isn't.
You could use a dynamic SQL approach, but for one parameter, I doubt there will more anything more than a negligible benefit. Using the above is much easier for others to understand as well, and the cost of generating the plan every time the query is run should be tiny for such a simple query.
Using a table type parameter, it would actually likely be more performant (assuming you have a useable index on YourColumn) to use a UNION ALL query like the below:
SELECT {Your Columns} --Don't use *
FROM dbo.YourTable YT
JOIN #Users U ON YT.YourColumn = U.UserId
UNION ALL
SELECT {Your Columns} --Don't use *
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1 FROM #Users U);
You could avoid using STRING_SPLIT entirely here:
WHERE ',' + #USER_ID + ',' LIKE '%,' + <table_name>.<col_name> + ',%'
If the #UserID is NULL, then the condition (1=1) is executed, this is always true. If the #UserID is not NULL, then the OR condition is executed in which all <col_name> in (a, b, c) are present.
SELECT *
FROM <table_name>
WHERE (1=1)
AND ((#USER_ID IS NULL AND (1=1)) -- always true
OR (<table_name>.<col_name> in (SELECT [value] FROM SplitString(#USER_ID,',')))
)

Understanding CTE Semicolon Placement

When I run this CTE in SQL Server it says the syntax is incorrect by the declare statement.
;WITH cte as
(
SELECT tblKBFolders.FolderID
from tblKBFolders
where FolderID = #FolderID
UNION ALL
SELECT tblKBFolders.FolderID
FROM tblKBFolders
INNER JOIN cte
ON cte.FolderID = tblKBFolders.ParentFolderID
)
declare #tblQueryFolders as table (FolderID uniqueidentifier)
insert into #tblQueryFolders
SELECT FolderID From cte;
But if I move the declare to before the CTE, it runs just fine.
declare #tblQueryFolders as table (FolderID uniqueidentifier)
;WITH cte as
(
SELECT tblKBFolders.FolderID
from tblKBFolders
where FolderID = #FolderID
UNION ALL
SELECT tblKBFolders.FolderID
FROM tblKBFolders
INNER JOIN cte
ON cte.FolderID = tblKBFolders.ParentFolderID
)
insert into #tblQueryFolders
SELECT FolderID From cte;
Why is that?
The answer you ask for was given in a comment already: This has nothing to do with the semicolon's placement.
Important: The CTE's WITH cannot follow right after a statement without an ending semicolon. There are many statments, where a WITH-clause would add something to the end of the statement (query hints, the WITH after OPENJSON etc.). The engine would have to guess, whether this WITH adds to the statment before or if it is a CTE's start. That's the reason, why we often see
;WITH cte AS (...)
That's actually the wrong usage of a semicolon. People put it there, just not to forget about it. Anyway it is seen as better style and best practice to end T-SQL statements always with a semicolon (and do not use ;WITH, as it adds an empty statement actually).
A CTE is not much more than syntactical sugar. Putting the CTE's code within a FROM(SELECT ...) AS SomeAlias would be roughly the same. In most cases this would lead to the same execution plan. It helps in cases, where you'd have to write the same FROM(SELECT ) AS SomeAlias in multiple places. And - in general - it makes things easier to read and understand. But it is not - by any means - comparable to a temp table or a table variable. The engine will treat it as inline code and you can use it in the same statement exclusively.
So this is the same:
WITH SomeCTE AS(...some query here...)
SELECT SomeCTE.* FROM SomeCTE;
SELECT SomeAlias.*
FROM (...some query here...) AS SomeAlias;
Your example looks like you think of the CTE as kind of a temp table definition, which you can use in the following statements. But this is not correct.
After the CTE the engine expects another CTE or a final statement like SELECT or UPDATE.
WITH SomeCTE AS(...some query here...)
SELECT * FROM SomeCTE;
or
WITH SomeCTE AS( ...query... )
,AnotherCTE AS ( ...query... )
SELECT * FROM AnotherCTE;
...or another content added with the WITH clause:
WITH XMLNAMESPACES( ...namespace declarations...)
,SomeCTE AS( ...query... )
SELECT * FROM SomeCTE;
All of these examples are one single statement.
Putting a DECLARE #Something in the middle, would break this concept.

Parsing JSON with SQL: How to extract a record within a JSON object?

I'm looking at about 13,000 rows in a SQL Server table, and trying to parse out certain values within one column that is stored as json.
The json column values look something like this:
..."http://www.companyurl.com","FoundedYear":"2007","Status":"Private","CompanySize":"51-200","TagLine":"We build software we believe in","Origi...
I'd like to extract the value for "CompanySize", but not all rows include this attribute. Other complicating factors:
I'm not sure how many possible values there are within the "CompanySize" parameter.
"CompanySize" is not always followed by the "TagLine" parameter.
The one rule I know for certain: the CompanySize value is always a string of unknown length that follows the varchar string "CompanySize":" and terminates before the next "," string.
Ideally we would have upgraded fully to SQL Server 2016 so I'd be able to take advantage of SQL Server's JSON support, but that's not the case.
You can do this with CHARINDEX since you can pass it a start position, which will allow you to get the closing ". You probably shouldn't look for "," since if CompanySize is the final property, it won't have the ," at the end of that fragment. Doing this as an Inline Table-Valued Function (iTVF) will be pretty efficient (especially since 13k rows is almost nothing), you just need to use it with either CROSS APPLY or OUTER APPLY:
USE [tempdb];
GO
CREATE FUNCTION dbo.GetCompanySize(#JSON NVARCHAR(MAX))
RETURNS TABLE
AS RETURN
WITH SearchStart AS
(
SELECT '"CompanySize":"' AS [Fragment]
), Search AS
(
SELECT CHARINDEX(ss.Fragment, #JSON) AS [Start],
LEN(ss.Fragment) AS [FragmentLength]
FROM SearchStart ss
)
SELECT CASE Search.Start
WHEN 0 THEN NULL
ELSE SUBSTRING(#JSON,
(Search.Start + Search.FragmentLength),
CHARINDEX('"',
#JSON,
Search.Start + Search.FragmentLength
) - (Search.Start + Search.FragmentLength)
)
END AS [CompanySize]
FROM Search;
GO
Set up the test:
CREATE TABLE #tmp (JSON NVARCHAR(MAX));
INSERT INTO #tmp (JSON) VALUES
('"http://www.companyurl.com","FoundedYear":"2007","Status":"Private","CompanySize":"51-200","TagLine":"We build software we believe in","Origi..');
INSERT INTO #tmp (JSON) VALUES
('"http://www.companyurl.com","FoundedYear":"2009","Status":"Public","TagLine":"We build software we believe in","Origi..');
INSERT INTO #tmp (JSON) VALUES (NULL);
Run the test:
SELECT comp.CompanySize
FROM #tmp tmp
CROSS APPLY tempdb.dbo.GetCompanySize(tmp.JSON) comp
Returns:
CompanySize
-----------
51-200
NULL
NULL
Building on #srutzky 's answer, the following solution avoids creating a UDF (although you didn't say that was a constraint, it might be useful for some).
select
c.Id,
substring(i2.jsontail, 0, i3.[length]) CompanySize
from
Companies c cross apply
( select charindex('CompanySize":"', c.json) start ) i1 cross apply
( select substring(c.json, start + len('CompanySize":"'), len(c.json) - start ) jsontail ) i2 cross apply
( select charindex('"', i2.jsontail) [length] ) i3
where
i1.[start] != 0

Strange behavior of CTE

I just answered this: Generate scripts with new ids (also for dependencies)
My first attempt was this:
DECLARE #Form1 UNIQUEIDENTIFIER=NEWID();
DECLARE #Form2 UNIQUEIDENTIFIER=NEWID();
DECLARE #tblForms TABLE(id UNIQUEIDENTIFIER,FormName VARCHAR(100));
INSERT INTO #tblForms VALUES(#Form1,'test1'),(#Form2,'test2');
DECLARE #tblFields TABLE(id UNIQUEIDENTIFIER,FormId UNIQUEIDENTIFIER,FieldName VARCHAR(100));
INSERT INTO #tblFields VALUES(NEWID(),#Form1,'test1.1'),(NEWID(),#Form1,'test1.2'),(NEWID(),#Form1,'test1.3')
,(NEWID(),#Form2,'test2.1'),(NEWID(),#Form2,'test2.2'),(NEWID(),#Form2,'test2.3');
--These are the originalIDs
SELECT frms.id,frms.FormName
,flds.id,flds.FieldName
FROM #tblForms AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId ;
--The same with new ids
WITH FormsWithNewID AS
(
SELECT NEWID() AS myNewFormID
,*
FROM #tblForms
)
SELECT frms.myNewFormID, frms.id,frms.FormName
,NEWID() AS myNewFieldID,flds.FieldName
FROM FormsWithNewID AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId
The second select should deliver - at least I thought so - two values in "myNewFormID", each three times... But it comes up with 6 different values. This would mean, that the CTE's "NEWID()" is done for each row of the final result set. What am I missing?
Your understanding of CTEs is wrong. They are not simply a table variable that's filled with the results of the query - instead, they are a query on their own. Note that CTEs can be used recursively - this would be quite a sight with table variables :)
From MSDN:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
The "can be thought" of is a bit deceiving - sure, it can be thought of, but it's not a result set. You don't see this manifesting when you're only using pure functions, but as you've noticed, newId is not pure. In reality, it's more like a named subquery - in your example, you'll get the same thing if you just move the query from the CTE to the from clause directly.
To illustrate this even further, you can add another join on the CTE to the query:
WITH FormsWithNewID AS
(
SELECT NEWID() AS myNewFormID
,*
FROM #tblForms
)
SELECT frms.myNewFormID, frms.id,frms.FormName
,NEWID() AS myNewFieldID,flds.FieldName,
frms2.myNewFormID
FROM FormsWithNewID AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId
left join FormsWithNewID as frms2 on frms.id = frms2.id
You'll see that the frms2.myNewFormID contains different myNewFormIDs.
Keep this in mind - you can only treat the CTE as a result set when you're only using pure functions on non-changing data; in other words, if executing the same query in a serializable transaction isolation level twice will produce the same result sets.
NEWID() returns a value every time it is executed. Whenever you use it you get a new value
For example,
select top 5 newid()
from sys.tables
order by newid()
You will not see them order by because the selected field is produced with different values than the Order By field

Awkward JOIN causes poor performance

I have a stored procedure that combines data from several tables via UNION ALL. If the parameters passed in to the stored procedure don't apply to a particular table, I attempt to "short-circuit" that table by using "helper bits", e.g. #DataSomeTableExists and adding a corresponding condition in the WHERE clause, e.g. WHERE #DataSomeTableExists = 1
One (psuedo) table in the stored procedure is a bit awkward and causing me some grief.
DECLARE #DataSomeTableExists BIT = (SELECT CASE WHEN EXISTS(SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') THEN 1 ELSE 0 END);
...
UNION ALL
SELECT *
FROM REF_MinuteDimension AS dim WITH (NOLOCK)
CROSS JOIN (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') AS T
CROSS APPLY dbo.fGetLastValueFromSomeTable(T.ParentId, dim.TimeStamp) dpp
WHERE #DataSomeTableExists = 1 AND dim.TimeStamp >= #StartDateTime AND dim.TimeStamp <= #EndDateTime
UNION ALL
...
Note: REF_MinuteDimension is nothing more than smalldatetimes with minute increments.
(1) The execution plan (below) indicates a warning on the nested loops operator saying that there is no join predicate. This is probably not good, but there really isn't a natural join between the tables. Is there a better way to write such a query? For each ParentId in T, I want the value from the UDF for every minute between #StartDateTime and #EndDateTime.
(2) Even when #DataSomeTableExists = 0, there is I/O activity on the tables in this query as reported by SET STATISTICS IO ON and the actual execution plan. The execution plan reports 14.2 % cost which is too much considering these tables don't even apply in this case.
SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable' comes back empty.
Is it the way my query is written? Why wouldn't the helper bit or an empty T short circuit this query?
For 2) I can sure say that line
CROSS JOIN (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable') AS T
Ill force #T to be analysed and to enter a join. You can create to versions of a SP with and without that join and use that flag to execute one or another but I cannot say that ill save any response time||cpu clocks||I/O bandwith||memory.
For 1) I suggest to remove the (nolock) if you are using SQL Server 2005 or better and to keep a close eye in that UDF. Cannot say more without a good SQL fiddle.
I should mention, I have no clue if this will ever work, as it's kind of an odd way to write a sproc and table-valued UDFs aren't well understood by the query optimizer. You might have to build your resultset into a table variable or temp table conditionally, based on IF statements, then return that data. But I would try this, first:
--helper bit declared
declare #DataSomeTableExists BIT = 0x0
if exists (select 1 from #T where StorageTable = 'DATA_SomeTable')
begin
set #DataSomeTableExists = 0x1
end
...
UNION ALL
SELECT *
FROM REF_MinuteDimension AS dim WITH (NOLOCK)
CROSS JOIN (SELECT * FROM #T WHERE StorageTable = 'DATA_SomeTable' and #DataSomeTableExists = 0x1) AS T
CROSS APPLY dbo.fGetLastValueFromSomeTable(T.ParentId, dim.TimeStamp) dpp
WHERE #DataSomeTableExists = 0x1 AND dim.TimeStamp >= #StartDateTime AND dim.TimeStamp <= #EndDateTime
UNION ALL
...
And if you don't know already, the UDF might be giving you weird readings in the execution plans. I don't know enough to give you accurate data, but you should search around to understand the limitations.
Since your query is dependent on run-time variables, consider using dynamic SQL to create your query on the fly. This way you can include the tables you want and exclude the ones you don't want.
There are downsides to dynamic SQL, so read up

Resources