I have recently taken on some of the Jr DBA functions for my company. We're running SQL Server 2012.
I created an index yesterday but the query optimizer doesn't seem to be wanting to use it. I can force the index with the query hint, however I want to ensure the other users in the system is going to benefit from this index as well
select count(*)
from prospect p
join phone ph with (index([ix_phone_ProspectId]))
on ph.prospect_id = p.prospect_id
The above query runs about 3 seconds.
The below query runs for about a minute at which point I stop it
select count(*)
from prospect p
join phone ph on ph.prospect_id = p.prospect_id
Any suggestions on how I can help the query optimizer look for this index and start utilizing it moving forward? (note, statistics have been updated since deploying the index)
You could create a view, put the index hint in there and point users to that view.
CREATE TABLE X(i int PRIMARY KEY);
CREATE INDEX xx ON X(i)
GO
CREATE VIEW dbo.vwX WITH SCHEMABINDING AS
select xxx = count(*)
from dbo.X p with (index(xx))
You can also lose the current new index and create an indexed view then point users to that.
ALTER VIEW dbo.vwX WITH SCHEMABINDING AS
select xxx = count_big(*)
from dbo.X p
CREATE UNIQUE CLUSTERED INDEX vwXX ON dbo.vwX(xxx);
Note that you can't use an index hint with an indexed view unless its an in-memory table.
Related
Why is that when I query my base table with the following aggregate query snowflake doesn't reference my MV?
create or replace table customer_sample as (
SELECT * FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF100TCL"."CUSTOMER");
create or replace materialized view customer_sample_mv
as
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
Query Profile
There are lots of possible reasons e.g.
The MV was still being built when you executed the query
Snowflake determined it was quicker to execute the query without using the MV
The user running the query didn’t have the required privileges on the MV
etc.
In this example Snowflake is doing the right thing by skipping the materialized view.
First surprise: Scanning the materialized view is slower than just re-running the query:
select *
from customer_sample_mv
order by total_sum desc nulls last
limit 100;
-- 4.4s
vs
select *
from (
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1
)
order by total_sum desc nulls last
limit 100;
-- 3.6s
So Snowflake is saving time by not choosing the materialized view.
How is this possible?
Well, turns out there are no repeated customer ids. So pre-grouping them does nothing.
select c_customer_sk, count(*) c
from customer_sample
group by 1
having c>1
order by 2 desc
limit 10;
-- null
From the docs:
Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views
I have the following query that is taking more than 1 hour to run.
SELECT
RES.NUM_PROCESS,
RES.ID_SYSTEM
FROM
RESTRICTED_PRECESS RES -- 16'000 records
WHERE
RES.ID_SYSTEM <> 'CYFV'
AND RES.NUM_PROCESS NOT IN (SELECT PR.NUM_PROCESS
FROM PRECESS PR -- 8.000.000 records
WHERE PR.ID_SYSTEM = RES.ID_SYSTEM)
The indexes for the tables are already ok.
CREATE NONCLUSTERED INDEX [IX1_PROCESS] ON [dbo].[PRECESS]
(
ID_SYSTEM ASC
)
INCLUDE(NUM_PROCESS)
here's the execution plan
Is there any way to make this SELECT return records faster?
Thank you.
I will just go ahead and suggest what might be helpful indices here for the two tables:
CREATE INDEX idx1 ON RESTRICTED_PRECESS (ID_SYSTEM, NUM_PROCESS);
CREATE INDEX idx2 ON PRECESS (ID_SYSTEM, NUM_PROCESS);
The index on the outer table RESTRICTED_PRECESS should speed up the WHERE clause, and it also completely covers the SELECT clause. The index on the table PRECESS in the subquery should speed it up as well.
I have UI form which shows to user different aggregate information (fact, plan etc. - 6 different T-SQL queries run in parallel). Execution of pure SQL queries takes up to 3 seconds.
I use stored procedures with parameters, but there is no problem - call of SPs takes absolutely the same time.
Here I use example of one table and one query, another 5 queries and tables have the same structure. I use MS SQL Server 2012, it's possible to upgrade up to 2014 if any optimization reason.
Now I try to find all possible ways to improve it. And it should be only SQL ways.
Aggregate table structure:
create table dbo.plan_Total(
VersionId int not null,
WarehouseId int not null,
ChannelUnitId int not null,
ProductId] int not null,
Month date not null,
Volume float not null,
constraint PK_Total primary key clustered
(VersionId asc, WarehouseId asc, ChannelUnitId asc, ProductId asc, Month asc)) on PRIMARY
SP query structure:
ALTER PROCEDURE dbo.plan_GetTotals
#versionId INT,
#geoIds ID_LIST READONLY, -- lists from UI filters
#productIds ID_LIST READONLY,
#channelUnitIds ID_LIST READONLY
AS
begin
SELECT Id INTO #geos
FROM #geoIds
SELECT Id INTO #products
FROM #productIds
SELECT Id INTO #channels
FROM #channelUnitIds
CREATE CLUSTERED INDEX IDX_Geos ON #geos(Id)
CREATE CLUSTERED INDEX IDX_Products ON #products(Id)
CREATE CLUSTERED INDEX IDX_ChannelUnits ON #channels(Id)
SELECT Month, SUM(Volume) AS Volume
FROM plan_Total t
JOIN #geos g ON t.WarehouseId = g.Id
JOIN #products p ON t.ProductId = p.Id
JOIN #channels cu ON t.ChannelUnitId = cu.Id
WHERE VersionId = #versionId
GROUP BY Month
ORDER BY Month -- no any performance impact
END
Approx. execution time 600-800 ms. Time of another queries almost the same.
How can I dramatically decrease execution time? Is it possible?
What I've done already:
- Try columnstore indexes (clustered is not good because foreign key problem);
- Disable of non-clustered columnstore index is not solution, because in some tables need to update data online (user can change information);
- Rebuild all current indexes;
- Can't gather all tables in one.
Here is actual plan link:
Actual execution plan - for this plan i add real tables in joins instead of temp tables.
BR, thanks for any help!
Have you considered not asking not joining channel, product etc.?
At least channels - if you do not have 10.000 you can just load them "on demand" or "on application start" and cache them. This is a client side dictionary lookup.
Also Month, SUM(Volume)..... consider precalculating this, making a materialized view. Calculating this on demand is not what reporting should do and goes against data warehousing best practices.
All your solutions will not change that - they do not address the real problem: too much processing in the query.
See if this way works better
Create the TABLE type to have a PRIMARY KEY
Specify option RECOMPILE: force compiler to include cardinality of TABLE variables
Specify option OPTIMIZE FOR UNKNOWN: prevent parameter sniffing for #versionId
CREATE TYPE dbo.ID_LIST AS TABLE (
Id INT PRIMARY KEY
);
GO
CREATE PROCEDURE dbo.plan_GetTotals
#versionId INT,
#geoIds ID_LIST READONLY,
#productIds ID_LIST READONLY,
#channelUnitIds ID_LIST READONLY
AS
SELECT
Month,
SUM(Volume) AS Volume
FROM
plan_Total AS t
INNER JOIN #geoIds AS g ON g.Id=t.WarehouseId
INNER JOIN #productIds AS p ON p.Id=t.ProductId
INNER JOIN #channelUnitIds AS c ON c.Id=t.ChannelUnitId
WHERE
t.VersionId=#versionId
GROUP BY
Month
ORDER BY
Month
OPTION(RECOMPILE, OPTIMIZE FOR UNKNOWN);
GO
Ok, here I just show what I can find and how I increased speed of mu query.
List of addins:
Best way is to add Clustered columnstore index. For that you need to delete FK's but you can use triggers for example. This increase the query up to 3-4 times.
How you can see I use temp tables in query joins. I've changed one join (doesn't matter which) to IN operand like this "and t.productid in (select id from #productids)" it increased query pure speed twice.
This two made most impact to the query. Below I want to show the final query:
select [month], sum(volume) as volume
from #geos g
left join dbo.plan_Total t on t.warehouseid = g.id
join #channels cu on t.channelunitid = cu.id
where versionid = #versionid
and t.productid in (select id from #productids)
group by [month]
order by [Month]
With this changes I decrease query execution time from 0.8 to 0.2 ms.
I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.
There is an old SSIS package that pulls a lot of data from oracle to our Sql Server Database everyday. The data is inserted into a non-normalized database, and I'm working on a stored procedure to select that data, and insert it into a normalized database. The Oracle databases were overly normalized, so the query I wrote ended up having 12 inner joins to get all the columns I need. Another problem is that I'm dealing with large amounts of data. One table I'm selecting from has over 12 million records. Here is my query:
Declare #MewLive Table
(
UPC_NUMBER VARCHAR(50),
ITEM_NUMBER VARCHAR(50),
STYLE_CODE VARCHAR(20),
COLOR VARCHAR(8),
SIZE VARCHAR(8),
UPC_TYPE INT,
LONG_DESC VARCHAR(120),
LOCATION_CODE VARCHAR(20),
TOTAL_ON_HAND_RETAIL NUMERIC(14,0),
VENDOR_CODE VARCHAR(20),
CURRENT_RETAIL NUMERIC(14,2)
)
INSERT INTO #MewLive(UPC_NUMBER,ITEM_NUMBER,STYLE_CODE,COLOR,[SIZE],UPC_TYPE,LONG_DESC,LOCATION_CODE,TOTAL_ON_HAND_RETAIL,VENDOR_CODE,CURRENT_RETAIL)
SELECT U.UPC_NUMBER, REPLACE(ST.STYLE_CODE, '.', '')
+ '-' + SC.SHORT_DESC + '-' + REPLACE(SM.PRIM_SIZE_LABEL, '.', '') AS ItemNumber,
REPLACE(ST.STYLE_CODE, '.', '') AS Style_Code, SC.SHORT_DESC AS Color,
REPLACE(SM.PRIM_SIZE_LABEL, '.', '') AS Size, U.UPC_TYPE, ST.LONG_DESC, L.LOCATION_CODE,
IB.TOTAL_ON_HAND_RETAIL, V.VENDOR_CODE, SD.CURRENT_RETAIL
FROM MewLive.dbo.STYLE AS ST INNER JOIN
MewLive.dbo.SKU AS SK ON ST.STYLE_ID = SK.STYLE_ID INNER JOIN
MewLive.dbo.UPC AS U ON SK.SKU_ID = U.SKU_ID INNER JOIN
MewLive.dbo.IB_INVENTORY_TOTAL AS IB ON SK.SKU_ID = IB.SKU_ID INNER JOIN
MewLive.dbo.LOCATION AS L ON IB.LOCATION_ID = L.LOCATION_ID INNER JOIN
MewLive.dbo.STYLE_COLOR AS SC ON ST.STYLE_ID = SC.STYLE_ID INNER JOIN
MewLive.dbo.COLOR AS C ON SC.COLOR_ID = C.COLOR_ID INNER JOIN
MewLive.dbo.STYLE_SIZE AS SS ON ST.STYLE_ID = SS.STYLE_ID INNER JOIN
MewLive.dbo.SIZE_MASTER AS SM ON SS.SIZE_MASTER_ID = SM.SIZE_MASTER_ID INNER JOIN
MewLive.dbo.STYLE_VENDOR AS SV ON ST.STYLE_ID = SV.STYLE_ID INNER JOIN
MewLive.dbo.VENDOR AS V ON SV.VENDOR_ID = V.VENDOR_ID INNER JOIN
MewLive.dbo.STYLE_DETAIL AS SD ON ST.STYLE_ID = SD.STYLE_ID
WHERE (U.UPC_TYPE = 1) AND (ST.ACTIVE_FLAG = 1)
That query pretty much crashes our server. I tried to fix the problem by breaking the query up into smaller queries, but the temp table variable I use causes the tempdb database to fill the hard drive. I figure this is because the server runs out of memory, and crashes. Is there anyway to solve this problem?
Have you tried using a real table instead of a temporary one. You can use SELECT INTO to create a real table to store the results instead of a temporary one.
Syntax would be:
SELECT
U.UPC_NUMBER,
REPLACE(ST.STYLE_CODE, '.', '').
....
INTO
MEWLIVE
FROM
MewLive.dbo.STYLE AS ST INNER JOIN
...
The command will create the table, and may help with the memory issues you are seeing.
Additionally try looking at the execution plan in query analyser or try the index tuning wizard to suggest some indexes that may help speed up the query.
Try running the query from the Oracle server rather than from the SQL server. As it stands, there's most likely going to be a lot of communication over the wire as the query tries to process.
By pre-processing the joins (maybe with a view), you'll only be sending over the results.
Regarding the over-normalization: have you tested whether or not it's an issue in terms of speed? I find it hard to believe that it could be too normalized.
Proper indexing will definitely help
IF
amount of rows in this query not over "zillions" of rows.
Try the following:
Join on dbo.COLOR is excessive if there is FKey on dbo.STYLE_COLOR(COLOR_ID)=>dbo.COLOR(COLOR_ID)
Proper index (excessive, should be reviewed)
USE MewLive
CREATE INDEX ix1 ON dbo.STYLE_DETAIL (STYLE_ID)
INCLUDE (STYLE_CODE, LONG_DESC)
WHERE ACTIVE_FLAG = 1
GO
CREATE INDEX ix2 ON dbo.UPC (SKU_ID)
INCLUDE(UPC_NUMBER)
WHERE UPC_TYPE = 1
GO
CREATE INDEX ix3 ON dbo.SKU(STYLE_ID)
INCLUDE(SKU_ID)
GO
CREATE INDEX ix3_alternative ON dbo.SKU(SKU_ID)
INCLUDE(STYLE_ID)
GO
CREATE INDEX ix4 ON dbo.IB_INVENTORY_TOTAL(SKU_ID, LOCATION_ID)
INCLUDE(TOTAL_ON_HAND_RETAIL)
GO
CREATE INDEX ix5 ON dbo.LOCATION(LOCATION_ID)
INCLUDE(LOCATION_CODE)
GO
CREATE INDEX ix6 ON dbo.STYLE_COLOR(STYLE_ID)
INCLUDE(SHORT_DESC,COLOR_ID)
GO
CREATE INDEX ix7 ON dbo.COLOR(COLOR_ID)
GO
CREATE INDEX ON dbo.STYLE_SIZE(STYLE_ID)
INCLUDE(SIZE_MASTER_ID)
GO
CREATE INDEX ix8 ON dbo.SIZE_MASTER(SIZE_MASTER_ID)
INCLUDE(PRIM_SIZE_LABEL)
GO
CREATE INDEX ix9 ON dbo.STYLE_VENDOR(STYLE_ID)
INCLUDE(VENDOR_ID)
GO
CREATE INDEX ixA ON dbo.VENDOR(VENDOR_ID)
INCLUDE(VENDOR_CODE)
GO
CREATE INDEX ON dbo.STYLE_DETAIL(STYLE_ID)
INCLUDE(CURRENT_RETAIL)
In SELECT list replace U.UPC_TYPE, to 1 as UPC_TYPE,
Can you segregate the imports - batch them by SKU/location/vendor/whatever and run multiple queries to get the data over? Is there a particular reason it all needs to go across in one hit? (apart from the ease of writing the query)