I'm trying to query an xml column in sql server.
I've created a primary index on the column and query it using:
SELECT *
FROM MyTable
where Doc.exist('/xml/root/propertyx/text()[. = "something"]') = 1
In a table with 60 000 entries , this query takes some 100 ms on my local dev machine.
Is it possible to optimize this somehow to increase performance of the query?
You can optimize for fast query times with a calculated column. A calculated column can't use the XML functions directly, so you have to wrap them in a function:
go
create function dbo.GetSomethingExists(
#Doc xml)
returns bit
with schemabinding
as begin return (
select #Doc.exist('/xml/root/property/text()[. = "something"]')
) end
go
create table TestTable (
Doc xml,
SomethingExists as dbo.GetSomethingExists(Doc) persisted
)
go
If you declare the function with schemabinding, you can create an index on SomethingExists:
create index IX_TestTable_SomethingExists on TestTable(SomethingExists)
This should make the query much faster.
Creating a Secondary XML Index of Path type might speed things up for you.
Related
Is there is any solution / query to get first N records from stored procedure result without retrieving the whole result set?
Consider my stored procedure returns 3 million rows, and I just want the first 10 rows from it.
The best approach would be to alter your stored procedure to be able to include a parameter for the TOP filter.
However, you could also use
SET ROWCOUNT 10
EXEC MyProc
Be careful to reset the value of ROWCOUNT afterwards otherwise you may impact other queries.
The downside is that you cannot control the order of the rows. I also haven't tested with such a large result set to identify whether this does reduce resource consumption enough.
You can use TOP clause to achieve it
Syntax
SELECT TOP number|percent column_name(s)
FROM table_name
WHERE condition;
Let's say that you have Your_stored_procedure return list of users
CREATE PROCEDURE Your_stored_procedure
AS
SELECT UserId, UserName
FROM yourtable
GO;
At here, you need to create temp table to store value from stored procedure
-- Check result
CREATE TABLE #TempTable
(
UserId INT,
UserName varchar(100),
)
INSERT INTO #TempTable(UserId, UserName)
EXEC Your_stored_procedure
Then you can get the result like this way
SELECT TOP 10 UserId, UserName
FROM #TempTable
ORDER BY UserId -- As #Squirrel's comment, TOP should come with ORDER BY
Note
You should make sure that the number of columns in your table according to the structure of the stored procedure.
Updated
As #Vinod Kumar's comment, you can also achieve it by using OPENQUERY like below
SELECT top 1 * FROM OPENQUERY ([MyServer], 'EXEC [VinodTest].[dbo].[tblAuthorsLarge] year = 2014')
You can use Fetch next clause. Please refer this for more information.
SELECT column-names
FROM table-name
ORDER BY column-names
OFFSET n ROWS
FETCH NEXT m ROWS ONLY
The filter on indexed field using comparison with variable in query below is not using index:
Below is query using comparison with constant, which is using index:
The related index:
Please explain why first query is not using index, and how to make it using index?
Thanks!
It is a ad-hoc query. The engine just ignores your variable and builds an execution plan which can be used with every query no matter the value of your variable. For example, let's generate some data:
DROP TABLE IF EXISTS [dbo].[DataSource];
CREATE TABLE [dbo].[DataSource]
(
[ID] INT IDENTITY(1000, 1) PRIMARY KEY
,[DateTimeCreated] DATETIME2
,[SampleText] NVARCHAR(4000)
);
CREATE INDEX IX_DateTimeCreated ON [dbo].[DataSource] ([DateTimeCreated]);
INSERT INTO [dbo].[DataSource] ([DateTimeCreated], [SampleText])
SELECT SYSDATETIME()
,LEFT(REPLICATE([number], 3500), 3500)
FROM [master]..[spt_values];
UPDATE [dbo].[DataSource]
SET [DateTimeCreated] = '2018-01-01'
WHERE [ID] < 1051;
GO
and set 50 records to have date 2018-01-01. Now, clear the buffers and the cache (do not execute on production SQL instance) and run the following queries separately:
DBCC DROPCLEANBUFFERS;
DBCC FREEPROCCACHE;
GO
DECLARE #filter DATETIME2 = '2018-01-01'
SELECT *
FROM [dbo].[DataSource]
WHERE [DateTimeCreated] = #filter;
GO
SELECT *
FROM [dbo].[DataSource]
WHERE [DateTimeCreated] = '2018-01-01';
You will can see the engine builds separate execution plans for each query and you can the same executions plans as in your example (the variable value is ignored):
SELECT cacheobjtype, objtype, text,usecounts
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
WHERE [objtype] = 'Adhoc'
AND [text] LIKE '%2018-01-01%'
AND [text] NOT LIKE '%dm_exec_cached_plans%'
ORDER BY usecounts DESC;
If you want to force the engine to build the plan respecting the value of your variable, you can use recompile option or WITH INDEX hint:
DECLARE #filter DATETIME2 = '2018-01-01'
SELECT *
FROM [dbo].[DataSource]
WHERE [DateTimeCreated] = #filter
OPTION (RECOMPILE);
GO
DECLARE #filter DATETIME2 = '2018-01-01'
SELECT *
FROM [dbo].[DataSource] WITH (INDEX = IX_DateTimeCreated)
WHERE [DateTimeCreated] = #filter;
GO
I have some cases, where I need to provide index hints but usually its better to have the correct indexes, to have the regular indexes maintenance and to write T-SQL statements that are easy for the engine to understand and optimize and not to worry about how he is doing his work.
In your case, the statement is pretty simple, so I believe this is just some default behavior for ignoring to value in order to speed up the ad-hoc queries. You can wrap the statement in stored procedure and to see the cache plan and the execution plan again.
Given a stored procedure like this:
create procedure BigParameterizedSearch(#criteria xml, #page int, #pageSize int)
as
...lots of populating table variables & transforming things...
select ..bunch of columns..
from ..bunch of tables..
where ..bunch of (#has_some_filter=0 or ..some filter criteria..) and..
order by ..big case statement depends on criteria..
offset (#page-1)*#pageSize rows
fetch next #pageSize rows only
option (recompile)
and its 'summary' counterpart:
create procedure BigParameterizedSearchSummary(#criteria xml, #page int, #pageSize int)
as
...same exact populating table variables & transforming things...
select groupCol, ..various aggregates..
from ..same exact bunch of tables..
where ..same exact bunch of (#has_some_filter=0 or ..some filter criteria..) and..
group by groupCol
order by ..smaller case statement depends on criteria..
offset (#page-1)*#pageSize rows
fetch next #pageSize rows only
option (recompile)
The two stored procedures are largely the same, only the select clause and order by clause is different, and the 'summary' version adds a group by.
Now, the question is, how could these two stored procedures be combined? Or, how otherwise could the duplicate code be avoided? I have tried in vain to create a common table-valued-function with returns table as return so that the select and group by could be pulled out to calling stored procedures without impacting performance. The restrictions on returns table as return makes it too difficult to perform all of the complicated setup, and if I make it populate a table variable then the full result set is populated for each page which slows it down too much. Is there any other strategy aside from going full dynamic SQL?
You could use a view that contains all the columns from all the tables with all the filters you would need for both queries and then do your selects and groupings against that view. That would save you on the issue of redundancy if I understand your situation correctly. Honestly this sort of issue is exactly what views are for.
Also, I am just curious, do you really need to recompile every time? Is there something incredible going on that you can't afford to use a cached execution plan?
Could you not just split this into a sproc that populates the tables and a sproc that calls the table population using the same parameters and then queries the tables?
create table Prepop (a int)
go
create procedure uspPopulate (#StartNum int)
as
truncate table Prepop
insert into Prepop values
(#StartNum)
,(#StartNum+1)
,(#StartNum+2)
,(#StartNum+3)
go
create procedure uspCall (#StartNum int, #Summary bit)
as
exec uspPopulate #StartNum = #StartNum
if #Summary = 1
select avg(a) as Avga
from Prepop
else
select a
from Prepop
go
exec uspCall #StartNum = 6, #Summary = 1
exec uspCall #StartNum = 6, #Summary = 0
I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.
IMHO SQL Server can choose itself (unless being told) what is the best index to use for the query.
Ok
What about something like this (pseudo code):
select __a from tbl where __a not in
(
select __b from tbl
)
(let's say we have index_1 which is for (__a) and index_2 which is for (__b)
Will SQL Server still use one index at execution or multiple indexes together...?
First, create your tables:
USE tempdb;
GO
CREATE TABLE dbo.tbl(__a INT, __b INT);
Then create two indexes:
CREATE INDEX a_index ON dbo.tbl(__a);
CREATE INDEX b_index ON dbo.tbl(__b);
Now populate with some data:
INSERT dbo.tbl(__a, __b)
SELECT [object_id], column_id
FROM sys.all_columns;
Now run your query and turn actual execution plan on. You will see something like this, showing that yes, both indexes are used (in fact the index on __b is used both for data retrieval in the subquery and as a seek to eliminate rows):
A more efficient way to write your query would be:
select __a from dbo.tbl AS t where not exists
(
select 1 from dbo.tbl AS t2
where t2.__b = t.__a
);
Now here's your whole plan (again, both indexes are used, but notice how there are much fewer operations):