Does an index on a Postgres table speed searches of views that reference it?
For example, suppose I have the following:
CREATE TABLE my_table(my_column INT); -- Then insert lots of rows into the table.
CREATE VIEW my_view AS SELECT my_column FROM my_table;
CREATE INDEX my_index ON my_table(my_column);
SELECT * FROM my_view WHERE my_column = 1;
Does the SELECT statement on line 4 benefit from the index on line 3?
Yes, that will certainly work. The query rewriter replaces the view with its definition, and the optimizer processes the result.
EXPLAIN the query and convince yourself.
A colleague works in a business which uses Microsoft SQL Server. Their team creates stored procedures that are executed daily to create data extracts. The underlying tables are huge (some have billions of rows), so most stored procedures are designed such that first they extract only the relevant rows of these huge tables into temporary tables, and then the temp tables are joined with each other and with other smaller tables to create a final extract. Something similar to this:
SELECT COL1, COL2, COL3
INTO #TABLE1
FROM HUGETABLE1
WHERE COL4 IN ('foo', 'bar');
SELECT COL1, COL102, COL103
INTO #TABLE2
FROM HUGETABLE2
WHERE COL14 = 'blah';
SELECT COL1, COL103, COL306
FROM #TABLE1 AS T1
JOIN #TABLE2 AS T2
ON T1.COL1 = T2.COL1
LEFT JOIN SMALLTABLE AS ST
ON T1.COL3 = ST.COL3
ORDER BY T1.COL1;
Generally, the temporary tables are not modified after their creation (so no subsequent ALTER, UPDATE or INSERT operations). For the purpose of this discussion, let's assume the temporary tables are only used once later on (so only one SELECT query would rely on them).
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
My colleague believes that creating an index will make the join and the sort operations faster. I believe, however, that the total time will be larger, because index creation takes time. In other words, I assume that except for edge cases (like a temporary table which itself is extremely large, or the final SELECT query is very complex), SQL Server will use the statistics it has on the temporary tables to optimize the final query, and in doing so it will effectively index the temp tables as it sees fit.
In other words, I am used to think that creating an index is only useful if you know that table is used often; a single-use temporary table that is dropped once the stored procedure is complete is not worth indexing.
Neither of us knows enough about SQL Server optimizer to know in what ways we are right or wrong. Can you please help us better understand which of our assumptions are closer to truth?
Your friend is probably correct, because even if a table's going to be used in a single query, without seeing the query (and even if we do, we still don't have a great idea of what it's execution plan looks like) we have no idea how many times SQL Server will need to find data within various columns of each of those tables for joins, sorts, etc.
However, we'll never know for sure until it's actually done both ways and the results measured and compared.
If you are doing daily data extracts with billions of rows, I would recommend you use a staging tables instead of a temporary table. This will isolate your extracts from other resources using tempdb.
Here is the question: is it a good idea to index these temporary tables after they are created and before they are used in the subsequent query?
Create the index after loading the data into temp table. This will eliminate fragmentation and statistics will be created.
the optimizer will use statistics to generate the optimal plan. So if you don't have a statistics, it could dramatically affect your query performance especially for large datasets.
Example below query the before and after comparison of index creation in temp table:
/* Create index after data load into temp table -- stats is created */
CREATE TABLE #temp ( [text] varchar(50), [num] int);
INSERT INTO #temp([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp (num);
DBCC SHOW_STATISTICS ('tempdb..#temp', 'IX_num');
/* Create index before data load into temp table -- stats is not created */
CREATE TABLE #temp_nostats ( [text] varchar(50), [num] int);
CREATE UNIQUE CLUSTERED INDEX [IX_num] ON #temp_nostats (num);
INSERT INTO #temp_nostats([text], [num]) VALUES ('aaa', 1), ('bbb', 2) , ('ccc',3);
DBCC SHOW_STATISTICS ('tempdb..#temp_nostats', 'IX_num');
You need to test if the index will help you or not. You need to balance how many index you can have because it can also impact your performance if you have too many index.
Using SQL Server 2014:
Is there any performance difference between the following statements?
DELETE FROM MyTable where PKID IN (SELECT PKID FROM #TmpTableVar)
AND
DELETE FROM MyTable INNER JOIN #TmpTableVar t ON MyTable.PKID = t.PKID
In your given example the execution plans will be the same (most probably).
But having same execution plans doesn't mean that they are the best execution plans you can possibly have for this statement.
The problem I see in both of your queries is the use of the Table Variable.
SQL Server always assumes that there is only 1 row in the table variable. Only in SQL Server 2014 and later version this assumption has been changed to 100 rows.
So no matter how many rows you have this the table variable SQL Server will always assume you have one row in the #TmpTableVar.
You can change your code slightly to give SQL Server a better idea of how many rows there will be in that table by replacing it with a Temporary table and since it is a PK_ID Column in your table variable you can also create an index on that table, to give best chance to sql server to come up with the best possible execution plan for this query.
SELECT PKID INTO #Temp
FROM #TmpTableVar
-- Create some index on the temp table here .....
DELETE FROM MyTable
WHERE EXISTS (SELECT 1
FROM #Temp t
WHERE MyTable.PKID = t.PKID)
Note
In operator will work fine since it is a primary key column in the table variable. but if you ever use IN operator on a nullable column, the results may surprise you, The IN operator goes all pear shape as soon as it finds a NULL values in the column it is checking on.
I personally prefer Exists operator for such queries but inner joins should also work just fine but avoid IN operators if you can.
I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.
I'm trying to query an xml column in sql server.
I've created a primary index on the column and query it using:
SELECT *
FROM MyTable
where Doc.exist('/xml/root/propertyx/text()[. = "something"]') = 1
In a table with 60 000 entries , this query takes some 100 ms on my local dev machine.
Is it possible to optimize this somehow to increase performance of the query?
You can optimize for fast query times with a calculated column. A calculated column can't use the XML functions directly, so you have to wrap them in a function:
go
create function dbo.GetSomethingExists(
#Doc xml)
returns bit
with schemabinding
as begin return (
select #Doc.exist('/xml/root/property/text()[. = "something"]')
) end
go
create table TestTable (
Doc xml,
SomethingExists as dbo.GetSomethingExists(Doc) persisted
)
go
If you declare the function with schemabinding, you can create an index on SomethingExists:
create index IX_TestTable_SomethingExists on TestTable(SomethingExists)
This should make the query much faster.
Creating a Secondary XML Index of Path type might speed things up for you.