Does an index on a Postgres table speed searches of views that reference it?
For example, suppose I have the following:
CREATE TABLE my_table(my_column INT); -- Then insert lots of rows into the table.
CREATE VIEW my_view AS SELECT my_column FROM my_table;
CREATE INDEX my_index ON my_table(my_column);
SELECT * FROM my_view WHERE my_column = 1;
Does the SELECT statement on line 4 benefit from the index on line 3?
Yes, that will certainly work. The query rewriter replaces the view with its definition, and the optimizer processes the result.
EXPLAIN the query and convince yourself.
Related
query in teradata:
create table databasename.tablename, no fallback, no before journal, no after journal
as
(
select a, b
from databasename2.tablename2
where x=y
) with data unique primary index (a,b);
getting converted to(in snowflake) :
CREATE TABLE IF NOT EXISTS databasename.tablename AS SELECT a,b FROM databasename2.tablename2 WHERE x = y;
INSERT OVERWRITE INTO databasename.tablename SELECT DISTINCT * FROM databasename.tablename;
i cannot understand which part of the teradata query is getting converted to the insert overwrite query.what is the significance of this insert query at end?
The Teradata command and the Snowflake commands may or may not result in the same rows in the table. Even if they result in the same rows, whether or not the 2-step approach written for Snowflake is the most efficient approach is another matter. It probably isn't, but here' what's happening:
CREATE TABLE IF NOT EXISTS databasename.tablename AS
SELECT a,b FROM databasename2.tablename2 WHERE x = y;
This obviously creates tablename with from tablename2 where x = y.
INSERT OVERWRITE INTO databasename.tablename
SELECT DISTINCT * FROM databasename.tablename;
This step reads the rows from the table just created, tablename and selects only the distinct rows. From this result set, it truncates tablename and fills it with the new rows in this select (that is, the distinct rows).
In Teradata, the create statement is enforcing a unique index on columns (a,b). Since Snowflake does not enforce primary keys or unique indexes, it may be necessary to deduplicate data. However, this deduplication on Snowflake is not happening against (a,b). It's happening on *, so all columns. This can lead to differences in the tables after creation.
I have a stored procedure that returns a dataset from a dynamic pivot query (meaning the pivot columns aren't know until run-time because they are driven by data).
The first column in this dataset is a product id. I want to join that product id with another product table that has all sorts of other columns that were created at design time.
So, I have a normal table with a product id column and I have a "dynamic" dataset that also has a product id column that I get from calling a stored procedure. How can I inner join those 2?
Dynamic SQL is very powerfull, but has some severe draw backs. One of them is exactly this: You cannot use its result in ad-hoc-SQL.
The only way to get the result of a SP into a table is, to create a table with a fitting schema and use the INSERT INTO NewTbl EXEC... syntax...
But there are other possibilities:
1) Use SELECT ... INTO ... FROM
Within your SP, when the dynamic SQL is executed, you could add INTO NewTbl to your select:
SELECT Col1, Col2, [...] INTO NewTbl FROM ...
This will create a table with the fitting schema automatically.
You might even hand in the name of the new table as a paramter - as it is dynamic SQL, but in this case it will be more difficult to handle the join outside (must be dynamic again).
If you need your SP to return the result, you just add SELECT * FROM NewTbl. This will return the same resultset as before.
Outside your SP you can join this table as any normal table...
BUT, there is a big BUT - ups - this sounds nasty somehow - This will fail, if the tabel exists...
So you have to drop it first, which can lead into deep troubles, if this is a multi-user application with possible concurrencies.
If not: Use IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='NewTbl') DROP TABLE NewTbl;
If yes: Create the table with a name you pass in as parameter and do you external query dynamically with this name.
After this you can re-create this table using the SELECT ... INTO syntax...
2) Use XML
One advantage of XML is the fact, that any structure and any amount of data can be stuffed into one single column.
Let your SP return a table with one single XML column. You can - as you know the schema now - create a table and use INSERT INTO XmlTable EXEC ....
Knowing, that there will be a ProductID-element you can extract this value and create a 2-column-derived-table with the ID and the depending XML. This is easy to join.
Using wildcards in XQuery makes it possible to query XML data without knowing all the details...
3) This was my favourite: Don't use dynamic queries...
I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.
When I run this code, it gives me different sorting results. When I manually do this in Excel, I always get the same results. Can anyone help? Thanks.
select * into tblVSOELookupSort1 from tblVSOELookup order by
[SKU],[ASP Local],[Sum of Qty]
alter table tblVSOELookupSort1 add RowID int identity(1,1) not null
select * into tblVSOELookupSort2 from tblVSOELookupSort1 order by
[Region Per L/U],[Currency]
drop table tblVSOELookupSort1
drop table tblVSOELookup
exec sp_rename tblVSOELookupSort2, tblVSOELookup
select * from tblVSOELookup
That's normal. SQL databases in general do not guarantee a particular row ordering of results unless you specify one. The order is dependent on the RDBMS implementation, query plan, and other things. If you want a particular row ordering in your query results, you must include an ORDER BY clause in your query. In this case, select * from tblVSOELookup order by ....
IMHO SQL Server can choose itself (unless being told) what is the best index to use for the query.
Ok
What about something like this (pseudo code):
select __a from tbl where __a not in
(
select __b from tbl
)
(let's say we have index_1 which is for (__a) and index_2 which is for (__b)
Will SQL Server still use one index at execution or multiple indexes together...?
First, create your tables:
USE tempdb;
GO
CREATE TABLE dbo.tbl(__a INT, __b INT);
Then create two indexes:
CREATE INDEX a_index ON dbo.tbl(__a);
CREATE INDEX b_index ON dbo.tbl(__b);
Now populate with some data:
INSERT dbo.tbl(__a, __b)
SELECT [object_id], column_id
FROM sys.all_columns;
Now run your query and turn actual execution plan on. You will see something like this, showing that yes, both indexes are used (in fact the index on __b is used both for data retrieval in the subquery and as a seek to eliminate rows):
A more efficient way to write your query would be:
select __a from dbo.tbl AS t where not exists
(
select 1 from dbo.tbl AS t2
where t2.__b = t.__a
);
Now here's your whole plan (again, both indexes are used, but notice how there are much fewer operations):