SQL Server - Row Level security using Cross Apply - sql-server

I'm developing a filter predicate for Row Level Security in SQL Server/Azure SQL Database.
Application logic related to the visibility coins required that a lot of tables must be read in order to understand if, a determined user, can read or less a row. I develop the following logic:
An inline table value function for the filter predicate;
-- Inside it, a CTE to get all the profiles for the user. The results of this CTE must be joined with a set of Inline Table valued functions using CROSS APPLY operator.
Following the code:
CREATE FUNCTION [scr].[prj_Projects](#ProjectId INT, #FilterId1 INT, #FilterId2 INT, #FilterId3 INT, #FilterId4 INT, #FilterId5 INT, #FilterId6 INT)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
WITH UserProfiles AS (
SELECT up.Id
FROM dbo.users u
INNER JOIN dbo.UsersProfiles up ON up.UserId = u.Id
INNER JOIN dbo.Profiles p ON p.id = up.ProfileId
WHERE SESSION_CONTEXT(N'UserId') = u.Id
)
SELECT Result = 1
FROM UserProfiles up
CROSS APPLY [scr].[prj_ProfilesFilter1](up.Id, #FilterId1)
CROSS APPLY [scr].[prj_ProfilesFilter2](up.Id, #FilterId2)
CROSS APPLY [scr].[prj_ProfilesFilter3](up.Id, #FilterId3)
CROSS APPLY [scr].[prj_ProfilesFilter4](up.Id, #FilterId4)
CROSS APPLY [scr].[prj_ProfilesFilter5](up.Id, #FilterId5)
CROSS APPLY [scr].[prj_ProfilesFilter6](up.Id, #FilterId6)
)
GO
Following the query for one ITVF (they have all the same structure).
CREATE OR ALTER FUNCTION [scr].[prj_ProfilesFilter1] (#UserProfileId INTEGER, #FilterId1 INTEGER)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN (
WITH UserProfile AS (
SELECT DISTINCT upba.FilterId1
FROM dbo.UsersProfilesFilters upba
WHERE upba.UserProfileId = #UserProfileId
), Datas AS (
SELECT b.Id
FROM dbo.Filters1 b
INNER JOIN UserProfile c ON c.FilterId1 = b.Id
UNION ALL
SELECT b.Id
FROM dbo.Filters1 b
WHERE NOT EXISTS (SELECT 1 FROM UserProfile)
UNION ALL
SELECT -1
WHERE NOT EXISTS (SELECT 1 FROM UserProfile)
) SELECT Id
FROM Datas d
WHERE d.Id = ISNULL(#FilterId1 , -1)
)
GO
I thought the design would have been ok, but unfortunately the performances are very bad. Is not related to the execution plan (I see only seek and no scan for example), but the problem is related to the high number of SCAN COUNT and LOGICAL READS that the query perform (very very high). It's strange because each cross apply returns only ONE ROW and there are only set operation.
Do you have any ideas on how to avoid this high number of logical reads?
I think it's a bug related to RLS
UPDATE:
Here the execution plan of the query: https://www.brentozar.com/pastetheplan/?id=r1mHXespO
As I said, the problem is related to the number of logical reads and scan count that the query perform, because the execution plan seems ok.

Ok, I figured out the problem: the result of SESSION_CONTEXT procedure must be casted, otherwise SQL Server cannot do correct assumption related to the cardinality of the query. Casting SESSION_CONTEXT, the performance became extremely good.
WHERE CAST(SESSION_CONTEXT(N'UserMail') AS NVARCHAR(255) = u.Email

Related

SQL - join two tables based on up-to-date entries

I have two tables
1- Table of TestModules
TestModules
2- Table of TestModule_Results
TestModule_Results
in order to get the required information for each TestModule, I am using FULL OUTER JOIN and it works fine.
FULL OUTER JOIN result
But what is required is slightly different. The above picture shows that TestModuleID = 5 is listed twice, and the requirement is to list the 'up-to-date' results based on time 'ChangedAt'
Of course, I can do the following:
SELECT TOP 1 * FROM TestModule_Results
WHERE DeviceID = 'xxx' and TestModuleID = 'yyy'
ORDER BY ChangedAt DESC
But this solution is for a single row and I want to do it in a Stored Procedure.
Expected output should be like:
ExpectedOutput
Any advise how can I implement it in a SP?
Use a Common Table Expression and Row_Number to add a field identifying the newest results, if any, and select for just those
--NOTE: a Common Table Expression requires the previous command
--to be explicitly terminiated, prepending a ; covers that
;WITH cteTR as (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY DeviceID, TestModuleID
ORDER BY ChangedAt DESC) AS ResultOrder
FROM TestModule_Results
--cteTR is now just like TestModule_Results but has an
--additional field ResultOrder that is 1 for the newest,
--2 for the second newest, etc. for every unique (DeviceID,TestModuleID) pair
)
SELECT *
FROM TestModules as M --Use INNER JOIN to get only modules with results,
--or LEFT OUTER JOIN to include modules without any results yet
INNER JOIN cteTR as R
ON M.DeviceID = R.DeviceID AND M.TestModuleID = R.TestModuleID
WHERE R.ResultOrder = 1
-- OR R.ResultOrder IS NULL --add if Left Outer Join
You say "this solution is for a single row"? Excellent. Use CROSS APPLY and change the WHERE clause from hand-input literal to the fields of the original table. APPLY operates at row level.
SELECT *
FROM TestModules t
CROSS APPLY
(
SELECT TOP 1 * FROM TestModule_Results
WHERE TestModule_Results.DeviceID = TestModules.DeviceID -- put the connecting fields here
ORDER BY ChangedAt DESC
)tr

Joining Results of Stored Procedure with Temporary Table

I have two tables that I have joined together. I'd like to join the result of the joined table with the results of a stored procedure that has two variables.
I'm not sure whether or not I should create two temporary tables or another function, so I'm a little lost on where I should even start and what the easiest method would be.
Below is my first join.
SELECT *
FROM dbo.Users a WITH (NOLOCK)
JOIN Company b ON a.email = b.email
Below is my stored procedure, all it does is split one column into more rows. Split is another function. I would like to use an inner join.
SELECT a.*, b.*
FROM [dbo].[Menu] a
CROSS APPLY dbo.Split(SalesPersons, ',') b
WHERE ID = #ID AND Date = #Date
The easiest way to do this, assuming that the output from the stored procedure is deterministic would be to populate the output of the stored procedure into a temp table and then join to it.
CREATE TABLE #tmp
(
COL1 INT NOT NULL,
COL2 INT NOT NULL
)
INSERT INTO #tmp
Exec sproc_YourSproc 'Params'
SELECT *
FROM dbo.Users u
INNER JOIN dbo.Company c ON u.email = c.email
INNER JOIN #tmp t ON t.ID = c.ID
That being said, as Martin Smith said above, you probably want to move that logic into the stored procedure if possible.
Also, please don't use (NOLOCK) it doesn't really help the way most people think that it does, and it can cause some really nasty results. (Double reading rows, ghost records, ect)
If you need to be able to perform reads without causing read/write contention, I would investigate using more optimistic isolation levels, find ways to optimize the read performance to reduce possible congestion, or find indexing strategies that would make it possible to satisfy reads without locking the table itself.

Too many parameter values slowing down query

I have a query that runs fairly fast under normal circumstances. But it is running very slow (at least 20 minutes in SSMS) due to how many values are in the filter.
Here's the generic version of it, and you can see that one part is filtering by over 8,000 values, making it run slow.
SELECT DISTINCT
column
FROM
table_a a
JOIN
table_b b ON (a.KEY = b.KEY)
WHERE
a.date BETWEEN #Start and #End
AND b.ID IN (... over 8,000 values)
AND b.place IN ( ... 20 values)
ORDER BY
a.column ASC
It's to the point where it's too slow to use in the production application.
Does anyone know how to fix this, or optimize the query?
To make a query fast, you need indexes.
You need a separate index for the following columns: a.KEY, b.KEY, a.date, b.ID, b.place.
As gotqn wrote before, if you put your 8000 items to a temp table, and inner join it, it will make the query even faster too, but without the index on the other part of the join it will be slow even then.
What you need is to put the filtering values in temporary table. Then use the table to apply filtering using INNER JOIN instead of WHERE IN. For example:
IF OBJECT_ID('tempdb..#FilterDataSource') IS NOT NULL
BEGIN;
DROP TABLE #FilterDataSource;
END;
CREATE TABLE #FilterDataSource
(
[ID] INT PRIMARY KEY
);
INSERT INTO #FilterDataSource ([ID])
-- you need to split values
SELECT DISTINCT column
FROM table_a a
INNER JOIN table_b b
ON (a.KEY = b.KEY)
INNER JOIN #FilterDataSource FS
ON b.id = FS.ID
WHERE a.date BETWEEN #Start and #End
AND b.place IN ( ... 20 values)
ORDER BY .column ASC;
Few important notes:
we are using temporary table in order to allow parallel execution plans to be used
if you have fast (for example CLR function) for spiting, you can join the function itself
it is not good to use IN with many values, the SQL Server is not able to build always the execution plan which may lead to time outs/internal error - you can find more information here

How to improve performance of this SQL Server query?

I was asked this question at web developer interview. after my answer interviewer said your in second table :(
I have two tables employee and bademployee:
employee (empid int pk, name varchar(20)`)
bademployee (badempid int pk, name varchar(20))
Now, I want to select only good employees.
My answer was :
SELECT *
FROM employee
WHERE empid NOT IN (SELECT badempid from bademployee)
He said this query is not good for performance.
Can any one tell me how to write query for same result, by not using negative terms(not in, !=).
Can it be done using LEFT OUTER JOIN ?
This can be rewritten using an OUTER JOIN with a NULL check or by using NOT EXISTS. I prefer NOT EXISTS:
SELECT *
FROM Employee e
WHERE NOT EXISTS (
SELECT 1
FROM bademployee b
WHERE e.empid = b.badempid)
Here is the OUTER JOIN, but I believe you'll have better performace with NOT EXISTS.
SELECT e.*
FROM Employee e
LEFT JOIN bademployee b ON e.empid = b.badempid
WHERE b.badempid IS NULL
Here's an interesting article about the performance differences: http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
Whatever someone else may say, you need to check the execution plan and base your conclusion on what that sais. Never just trust someone else that claims this or that, research into his claims and verify that with documentation on the subject and in this case the execution plan which clearly tells you what is going on.
One example from SQL Authority blogs shows that the LEFT JOIN solution performs much worse than the NOT IN solution. This is due to a LEFT ANTI SEMI JOIN done by the query planner which generally performs a lot better than a LEFT JOIN + NULL check. There may be exceptions when there are very few rows. The author also tells you afterwards the same as I did in the first paragraph: always check the execution plan.
Another blog post from SQL Performance blogs goes into this further with actual performance testing results.
TL;DR: In terms of performance NOT EXISTS and NOT IN are on the same level but NOT EXISTS is prefered due to issues with NULL values. Also, don't just trust what anyone claims, research and verify your execution plan.
I think the interviewer was wrong about the performance difference. Because the joined column is unique and not null in both tables, the NOT IN, NOT EXISTS, and LEFT JOIN...WHERE IS NULL queries are semantically identical. SQL is a declarative language so the SQL Server optimizer may provide optimal and identical plans regardless of now the query is expressed. That said, it is not always perfect so there may be variances, especially with more complex queries.
Below is a script that demonstrates this. On my SQL Server 2014 box, I see identical execution plans for the first 2 queries (ordered clustered index scans and a merge join), and the addition of a filter operator in the last. I would expect identical performance with all 3 so it doesn't really matter from a performance perspective. I would generally use NOT EXISTS because the intent is clearer and it avoids the gotcha in the case a NULL is returned by the NOT IN subquery, thus resulting in zero rows returned due to the UNKNOWN predicate result.
I would not generalize performance comparisons like this. If the joined columns allow NULL or are not guaranteed to be unique, these queries are not semantically the same and may yield different execution plans as a result.
CREATE TABLE dbo.employee (
empid int CONSTRAINT pk_employee PRIMARY KEY
, name varchar(20)
);
CREATE TABLE dbo.bademployee (
badempid int CONSTRAINT pk_bademployee PRIMARY KEY
, name varchar(20)
);
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t16M AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t256 AS c)
INSERT INTO dbo.employee(empid, name)
SELECT num, 'Employee name ' + CAST(num AS varchar(10))
FROM t16M
WHERE num <= 10000;
INSERT INTO dbo.bademployee(badempid, name)
SELECT TOP 5 PERCENT empid, name
FROM dbo.employee
ORDER BY NEWID();
GO
UPDATE STATISTICS dbo.employee WITH FULLSCAN;
UPDATE STATISTICS dbo.bademployee WITH FULLSCAN;
GO
SELECT *
FROM employee
WHERE empid NOT IN (SELECT badempid from bademployee);
SELECT *
FROM Employee e
WHERE NOT EXISTS (
SELECT 1
FROM bademployee b
WHERE e.empid = b.badempid);
SELECT e.*
FROM Employee e
LEFT JOIN bademployee b ON e.empid = b.badempid
WHERE b.badempid IS NULL;
GO

CROSS APPLY with table valued function restriction performance

I have problem with CROSS APPLY with parametrised table valued function.
Here is simplified pseudo code example:
SELECT *
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) AS lor
CROSS APPLY dbo.HeavyTableValuedFunction(lor.ID) AS htvf
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
WHERE ...
Inner select on table LOT_OF_ROWS_TABLE is returning many rows.
Joining tables LOT_OF_ROWS_TABLE and ANOTHER_TABLE returns only one or few rows.
Table valued function is very time consuming and when calling for a lot of
rows the select lasts very long time.
My problem:
The function is called for all rows returned from LOT_OF_ROWS_TABLE regardless of the fact that the data will be limited when just join ANOTHER_TABLE.
The select has to be in the shown format - it is generated and in fact it is much more dificult.
When I try to rewrite it, it can be very fast, but it cannot be rewritten like this:
SELECT *
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) AS lor
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
CROSS APPLY dbo.HeavyTableValuedFunction(at.ID) AS htvf
WHERE ...
I'd like to know:
Is there any setting or hint or something that forces select to call function only for finally restricted rows?
Thank you.
EDIT:
The table valued function is very complex: http://pastebin.com/w6azRvxR.
The select we are talking about is "user configured" and generated: http://pastebin.com/bFbanY2n.
you can divide this query into 2 parts use either table variable or temp table
SELECT lor.*,at.* into #tempresult
FROM (
SELECT lor.*
FROM LOT_OF_ROWS_TABLE lor
WHERE ...
) lor
INNER JOIN ANOTHER_TABLE AS at ON lor.ID = at.ID
WHERE ...
now do the time consuming part which is table valued function right
SELECT * FROM #tempresult
CROSS APPLY dbo.HeavyTableValuedFunction(#tempresult.ID) AS htvf
I believe this is what you are looking for.
Plan Forcing Scenario: Create a Plan Guide to Force a Plan Obtained from a Rewritten Query
Basically it describes re-writing the query to get a generated plan using the correct order of joins. Then saving off that plan and forcing your existing query (that does not get changed) to use the plan you saved off.
The BOL link I put in even gives a specific example of re-writing the query putting the joins in a different order and using a FORCE ORDER hint. Then using sp_create_plan_guild to take the plan from the re-written query and use it on the original query.
YES and NO... it's hard to interprit what you're trying to achieve without sample data IN and result OUT, to compare outcomes.
I'd like to know:
Is there any setting or hint or something that forces select to call
function only for finally restricted rows?
So I'll answer your question above (3 years later!!) directly, with a direct statement:
You need to learn about CTE and the difference between CROSS APPLY
compared to INNER JOIN and why using CROSS APPLY in your case is
necessary. You "could" take the code in your function and apply it
into a single SQL statement using CTE.
ie:
Read this and this.
Essentially, something like this...
WITH t2o AS
(
SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
FROM t2
)
SELECT t1.*, t2o.*
FROM t1
INNER JOIN
t2o
ON t2o.t1_id = t1.id
AND t2o.rn <= 3
Apply your query to extrapolate the date you want ONCE, and using CTE, then apply your second SQL using the CROSS APPLY.
You have no choice. You cannot do what you're trying to do in ONE SQL.

Resources