Bypassing TSQL CONTAINS filtering

Bypassing TSQL CONTAINS filtering - sql-server

Is there a best practice for bypassing fulltext filtering of resultset, if a search text not specified? What I do now is:
SELECT * FROM items
WHERE #search='all' OR CONTAINS(*,#search)
But I wonder if there is a more elegant way?

Your example is the most elegant way of writing the query. However you should really look at the execution plan to see if this is the most performant approach.
You may want to consider writing the query like this to ensure SQL Server isn't evaluating the CONTAINS operator when it isn't necessary.
if (#search = 'all')
SELECT * FROM items
else
SELECT * FROM items
WHERE CONTAINS(*,#search)

Related

Using an IN clause with table variable causes my query to run MUCH slower

I am using SSRS report whereby I need to pass multiple parameters to some SQL code.
Based on this blog post, the best way to handle multiple parameters is to used a split function, so that is the road I am following.
However, I am having some bad performance after following this.
For example, the following WHERE clause will return the data in 4 seconds:
AND DimBusinessDivision.Id IN (
22
)
This will also correctly return in 4 seconds:
DECLARE #BusinessDivisionId INT = 22
AND DimBusinessDivision.Id IN (
#BusinessDivisionId
)
However, using the split function such as below, It takes 2 minutes (which is the same time it takes without a WHERE clause:
AND DimBusinessDivision.Id IN (
SELECT Item FROM dbo.FuncSplit(#BusinessDivisionId, ',')
)
I've also tried creating a temp table and a table variable before the SQL statement with the results of the table but there's no difference. I have a feeling this has to do with the fact that the values are not literal values and that SQL server doesn't know what query plan to follow, or something similar. Does anyone know of any ways to increase the performance of this?
It simply doesn't like using a table to get the values in even if the table has the same amounts of rows.
UPDATE: I have used the table function as an inner join which has fixed the issue. Any idea's why this made all the difference?
INNER JOIN
dbo.FuncSplit(#BusinessDivisionIds, ',') AS FilteredBusinessDivisions ON
FilteredBusinessDivisions.Item = DimBusinessDivision.Id

A few things to play with:
Try the non-performant query and add OPTION (RECOMPILE); at the end of the query. If it magically runs much faster, then yes the issue was a bad cached query plan. For more information on this specific problem, you can Google "parameter sniffing" for a more thourough explanation.
You may also want to look at the function definition and toss a RECOMPILE in there too, and see what difference that makes.
Look at the estimated query plan and try to determine the difference.
But the root of the problem, I think, is that you are reinventing the wheel with this "split" function. You can have multi-valued parameters in SSRS and use "WHERE col IN #param": https://technet.microsoft.com/en-us/library/aa337396(v=sql.105).aspx
Unless there's a very specific reason you must split a comma separated list and cannot use normal parameters, just use a regular parameter that accepts multiple values.
Edit: I looked at the article you linked to. It's quite easy to have a SELECT ALL option in any reporting tool (not just SSRS), though it's not obvious. Using the "magic value" as written in the article you linked to works just fine. Can I ask what limitation is prompting you to need to do this string splitting?

How can I force a subquery to perform as well as a #temp table?

I am re-iterating the question asked by Mongus Pong Why would using a temp table be faster than a nested query? which doesn't have an answer that works for me.
Most of us at some point find that when a nested query reaches a certain complexity it needs to broken into temp tables to keep it performant. It is absurd that this could ever be the most practical way forward and means these processes can no longer be made into a view. And often 3rd party BI apps will only play nicely with views so this is crucial.
I am convinced there must be a simple queryplan setting to make the engine just spool each subquery in turn, working from the inside out. No second guessing how it can make the subquery more selective (which it sometimes does very successfully) and no possibility of correlated subqueries. Just the stack of data the programmer intended to be returned by the self-contained code between the brackets.
It is common for me to find that simply changing from a subquery to a #table takes the time from 120 seconds to 5. Essentially the optimiser is making a major mistake somewhere. Sure, there may be very time consuming ways I could coax the optimiser to look at tables in the right order but even this offers no guarantees. I'm not asking for the ideal 2 second execute time here, just the speed that temp tabling offers me within the flexibility of a view.
I've never posted on here before but I have been writing SQL for years and have read the comments of other experienced people who've also just come to accept this problem and now I would just like the appropriate genius to step forward and say the special hint is X...

There are a few possible explanations as to why you see this behavior. Some common ones are
The subquery or CTE may be being repeatedly re-evaluated.
Materialising partial results into a #temp table may force a more optimum join order for that part of the plan by removing some possible options from the equation.
Materialising partial results into a #temp table may improve the rest of the plan by correcting poor cardinality estimates.
The most reliable method is simply to use a #temp table and materialize it yourself.
Failing that regarding point 1 see Provide a hint to force intermediate materialization of CTEs or derived tables. The use of TOP(large_number) ... ORDER BY can often encourage the result to be spooled rather than repeatedly re evaluated.
Even if that works however there are no statistics on the spool.
For points 2 and 3 you would need to analyse why you weren't getting the desired plan. Possibly rewriting the query to use sargable predicates, or updating statistics might get a better plan. Failing that you could try using query hints to get the desired plan.

I do not believe there is a query hint that instructs the engine to spool each subquery in turn.
There is the OPTION (FORCE ORDER) query hint which forces the engine to perform the JOINs in the order specified, which could potentially coax it into achieving that result in some instances. This hint will sometimes result in a more efficient plan for a complex query and the engine keeps insisting on a sub-optimal plan. Of course, the optimizer should usually be trusted to determine the best plan.
Ideally there would be a query hint that would allow you to designate a CTE or subquery as "materialized" or "anonymous temp table", but there is not.

Another option (for future readers of this article) is to use a user-defined function. Multi-statement functions (as described in How to Share Data between Stored Procedures) appear to force the SQL Server to materialize the results of your subquery. In addition, they allow you to specify primary keys and indexes on the resulting table to help the query optimizer. This function can then be used in a select statement as part of your view. For example:
CREATE FUNCTION SalesByStore (#storeid varchar(30))
RETURNS #t TABLE (title varchar(80) NOT NULL PRIMARY KEY,
qty smallint NOT NULL) AS
BEGIN
INSERT #t (title, qty)
SELECT t.title, s.qty
FROM sales s
JOIN titles t ON t.title_id = s.title_id
WHERE s.stor_id = #storeid
RETURN
END
CREATE VIEW SalesData As
SELECT * FROM SalesByStore('6380')

Having run into this problem, I found out that (in my case) SQL Server was evaluating the conditions in incorrect order, because I had an index that could be used (IDX_CreatedOn on TableFoo).
SELECT bar.*
FROM
(SELECT * FROM TableFoo WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
I managed to work around it by forcing the subquery to use another index (i.e. one that would be used when the subquery was executed without the parent query). In my case I switched to PK, which was meaningless for the query, but allowed the conditions from the subquery to be evaluated first.
SELECT bar.*
FROM
(SELECT * FROM TableFoo WITH (INDEX([PK_Id]) WHERE Deleted = 1) foo
JOIN TableBar bar ON (bar.FooId = foo.Id)
WHERE
foo.CreatedOn > DATEADD(DAY, -7, GETUTCDATE())
Filtering by the Deleted column was really simple and filtering the few results by CreatedOn afterwards was even easier. I was able to figure it out by comparing the Actual Execution Plan of the subquery and the parent query.
A more hacky solution (and not really recommended) is to force the subquery to get executed first by limiting the results using TOP, however this could lead to weird problems in the future if the results of the subquery exceed the limit (you could always set the limit to something ridiculous). Unfortunately TOP 100 PERCENT can't be used for this purpose since SQL Server just ignores it.

Eliminating code duplication when querying multiple tables with the same schema's

I've inherited some code which uses multiple tables to store the same information depending on how old it is (one for the current day, the last month, etc.).
Currently most of the code is duplicated for every condition, and I'd like to try and eliminate the majority of the duplication in the stored procedures. Right now re-architecting the design is not an option as there are a number of applications that depend on the current design that I have no control over.
One option I've tried so far is loading the needed data into a temp table which I found to have a rather large performance hit. I've also tried using a cte structured like this:
;WITH cte_table(...)
AS
(
SELECT ...
FROM a
WHERE #queried_date = CONVERT(DATE, GETDATE())
UNION ALL
SELECT ...
FROM b
WHERE #queried_date BETWEEN --some range
)
This works and the performance isn't terrible, but it's not very nice looking.
Could anyone offer a better alternative?

Two suggestions:
Just use UNION, not UNION ALL. The UNION operator removes duplicates in that case. UNION ALL preserves dupes.
Using the CTE, the SELECT clause on the outside / end can have a DISTICT operator to bring back unique rows. Of course, not sure why you'd be using a CTE in this scenario since UNION should work just fine. (In fact, I believe SQL will optimize the query to the same plan structure either way...)
Any way you slice it, if you have duplicate data, either you have to do something like the above, or you have to make explicit clauses that remove dupe cases, using things like #temp tables or WHERE ... NOT IN ().

Use of With Clause in SQL Server

How does with clause work in SQL Server? Does it really give me some performance boost or does it just help to make more readable scripts?
When it is right to use it? What should you know about with clause before you start to use it?
Here's an example of what I'm talking about:
http://www.dotnetspider.com/resources/33984-Use-With-Clause-Sql-Server.aspx

I'm not entirely sure about performance advantages, but I think it can definitely help in the case where using a subquery results in the subquery being performed multiple times.
Apart from that it can definitely make code more readable, and can also be used in the case where multiple subqueries would be a cut and paste of the same code in different places.
What should you know before you use it?
A big downside is that when you have a CTE in a view, you cannot create a clustered index on that view. This can be a big pain because SQL Server does not have materialised views, and has certainly bitten me before.

Unless you use recursive abilities, a CTE is not better performance-wise than a simple inline view.
It just saves you some typing.
The optimizer is free to decide whether to reevaluate it or not, when it's being reused, and it most cases it decides to reevaluate:
WITH q (uuid) AS
(
SELECT NEWID()
)
SELECT *
FROM q
UNION ALL
SELECT *
FROM q
will return you two different NEWIDs.
Note that other engines may behave differently.
PostgreSQL, unlike SQL Server, materializes the CTEs.
Oracle supports a special hint, /*+ MATERIALIZE */, that tells the optimizer whether it should materialize the CTE or not.

with is a keyword in SQL which just stores the temporary result in a temporary table. Example:
with a(--here a is the temporary table)
(id)(--id acts as colomn for table a )
as(select colomn_name from table_name )
select * from a

How do I return an empty result set from a procedure using T-SQL?

I'm interested in returning an empty result set from SQL Server stored procedures in certain events.
The intended behaviour is that a L2SQL DataContext.SPName().SingleOrDefault() will result in CLR null value.
I'm presently using the following solution, but I'm unsure whether it would be considered bad practice, a performance hazard (I could not find one by reading the execution plan), or if there is simply a better way:
SELECT * FROM [dbo].[TableName]
WHERE 0 = 1;
The execution plan is a constant scan with a trivial cost associated with it.
The reason I am asking this instead of simply not running any SELECTs is because I'm concerned previous SELECT #scalar or SELECT INTO statements could cause unintended result sets to be served back to L2SQL. Am I worrying over nothing?

If you need column names in the response then proceed with the select TOP 0 * from that table, otherwise just use SELECT TOP 0 NULL. It should work pretty fast :)

That is a reasonable approach. Another alternative is:
SELECT TOP 0 * FROM [dbo].[TableName]

If you want to simply retrieve the metadata of a result set w/o any actual row, use SET FMTONLY ON.

I think the best solution is top 0 but not using a dummy table.
This does it for me
select top 0 null as column1, null as column2.
Using e.g. a system table may be fine for performance but looks unclean.

It's an entirely reasonable approach.
To alleviate any worries about performance (whoch you shouldn't have any in the first place - the server's smart enough to avoid table scanning for 1=0), pick a table that's very small and not heavily used - I'm sure your DB schema has one.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight