Indexing and optimization of where clause based on datetime field - sql-server

I have a database with more than a million of rowset data. When I execute this query it takes hours, mostly due to pageIOLatch_sh. There are currently no indexing. Can you suggest the possible indexing in where clause. I believe it should be on datetime as it is used in where as well as order by , if so which index to use.
if(<some condition>)
BEGIN
select <some columns>
From <some tables with joins(no lock)>
WHERE
((#var2 IS NULL AND a.addr IS NOT NULL)OR
(a.addr LIKE #var2 + '%')) AND
((#var3 IS NULL AND a.ca_id IS NOT NULL) OR
(a.ca_id = #var3)) AND
b.time >= #from_datetime AND b.time <= #to_datetime AND
(
(
b.shopping_product IN ('CX12343', 'BG8945', 'GF4543') AND
b.shopping_category IN ('online', 'COD')
)
OR
(
b.shopping_product = 'LX3454' and b.sub_shopping_list in ('FF544','GT544','KK543','LK5343')
)
OR
(
b.shopping_product = 'LK434434' and b.sub_shopping_list in ('LL5435','PO89554','IO948854','OR4334','TH5444')
)
OR
(
b.shopping_product = 'AZ434434' and b.sub_shopping_list in ('LL54352','PO489554','IO9458854','OR34334','TH54344')
)
)AND
ORDER BY
b.time desc
ELSE
BEGIN
select <some columns>
From <some tables with joins(no lock)>
where <similar where as above with slight difference>

Okay then,
I said "first take indexes on these : shopping_product and shopping_category sub_shopping_list , and secondly u can try on the date , after that see the execution plan. (or would be better to create partition on the time column)"
I'm working on oracle, but the basics are the same.
You can create 3 distinct indexes on that cols : shopping_product, shopping_category, sub_shopping_list . Or you can create 1 composite index for that 3 cols. The point is you need to examine the execution plan which one is the most effective for you.
Oh, and here is a.ca_id column (almost forget), you need an index for this too.
For the date column i think you would better create a partition instead of an index.
Summary, two ways: - create 4 distinct index (shopping_product,shopping_category,sub_shopping_list, ca_id) , create a range typed partition on the date column
- create 1 composite index (shopping_product,shopping_category,sub_shopping_list) and 1 normal index(ca_id) , create a range typed partition on the date column

You probably should learn about indexing if you're dealing with tables of this size. It's not a trivial process. JOIN operations are a big deal when sorting out which indexes you need. Read this. http://use-the-index-luke.com/
In the meantime, if your date-range is highly selective (that is, if
b.time >= #from_datetime AND b.time <= #to_datetime
chooses a reasonably small fraction of the rows in your database) you should try the following compound index.
b.shopping_product, b.time
If that doesn't help, try
b.time
by itself. The idea is to structure your index so the server can do a range scan. Without a knowledge of your whole query, there's not much else to offer.

Related

Select a random row from Oracle DB in a performant way

Using :
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
I am trying to fetch a random row. As suggested in other stackoverflow questions, I used DBMS_RANDOM.VALUE like this -
SELECT column FROM
( SELECT column
FROM table
WHERE COLUMN_VALUE = 'Y' -- value of COLUMN_VALUE
ORDER BY dbms_random.value
)
WHERE rownum <= 1
But this query isn't performant when the number of requests increase.
So I am looking for an alternative.
SAMPLE wouldn't work for me because the sample picked up through the clause wouldn't have a dataset that matches my WHERE clause. The query looked like this -
SELECT column FROM table SAMPLE(1) WHERE COLUMN_VALUE = 'Y'
Because the SAMPLE is applied before my WHERE clause, most times this returns no data.
P.S: I am ok to move some part of the logic to application layer (though i am definitely not looking for answers that suggest loading everything to memory)
The performance problems consist of two aspects:
selecting the data with column_value = 'Y' and
sorting this subset to get a random record
You didn't say if the subset of your table with column_value = 'Y' is a large or small. This is important and will drive your strategy.
If there are lots of records with column_value = 'Y' use the SAMPLE to limit the rows to by sorted.
You are right, this could lead to empty result - in this case repeat the query (you may additionally add a logic that increases the sample percent to avoid lot of repeating). This will boost performance while you sort ony a sample of the data
select id from (
select id from tt SAMPLE(1) where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
If there are only few records with column_value = 'Y' define an index on this column (or a separate partition) - this enables a effiective access to the records. Use the order by dbms_random.value approach. Sort will not degradate performance for small number of rows.
select id from (
select id from tt where column_value = 'Y' order by dbms_random.value )
where rownum <= 1;
Basically both approaches keep the sorted rows in small size. The first approach perform a table access comparable with FULL TABLE SCAN, the second performs INDEX ACCESS for the selected column_value.

Optimize SQL in MS SQL Server that returns more than 90% of records in the table

I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.

Why this query is running so slow?

This query runs very fast (<100 msec):
SELECT TOP (10)
[Extent2].[CompanyId] AS [CompanyId]
,[Extent1].[Id] AS [Id]
,[Extent1].[Status] AS [Status]
FROM [dbo].[SplittedSms] AS [Extent1]
INNER JOIN [dbo].[Sms] AS [Extent2]
ON [Extent1].[SmsId] = [Extent2].[Id]
WHERE [Extent2].[CompanyId] = 4563
AND ([Extent1].[NotifiedToClient] IS NULL)
If I add just a time filter, it takes too long (22 seconds!):
SELECT TOP (10)
[Extent2].[CompanyId] AS [CompanyId]
,[Extent1].[Id] AS [Id]
,[Extent1].[Status] AS [Status]
FROM [dbo].[SplittedSms] AS [Extent1]
INNER JOIN [dbo].[Sms] AS [Extent2]
ON [Extent1].[SmsId] = [Extent2].[Id]
WHERE [Extent2].Time > '2015-04-10'
AND [Extent2].[CompanyId] = 4563
AND ([Extent1].[NotifiedToClient] IS NULL)
I tried adding an index on the [Time] column of the Sms table, but the optimizer seems not using the index. Tried using With (index (Ix_Sms_Time)); but to my surprise, it takes even more time (29 seconds!).
Here is the actual execution plan:
The execution plan is same for both queries. Tables mentioned here have 5M to 8M rows (indices are < 1% fragmented and stats are updated). I am using MS SQL Server 2008R2 on a 16core 32GB memory Windows 2008 R2 machine)
Does it help when you force the time filter to kick in only after the client filter has run?
FI like in this example:
;WITH ClientData AS (
SELECT
[E2].[CompanyId]
,[E2].[Time]
,[E1].[Id]
,[E1].[Status]
FROM [dbo].[SplittedSms] AS [E1]
INNER JOIN [dbo].[Sms] AS [E2]
ON [E1].[SmsId] = [E2].[Id]
WHERE [E2].[CompanyId] = 4563
AND ([E1].[NotifiedToClient] IS NULL)
)
SELECT TOP 10
[CompanyId]
,[Id]
,[Status]
FROM ClientData
WHERE [Time] > '2015-04-10'
Create an index on Sms with the following Index Key Columns (in this order):
CompanyID
Time
You may or may not need to add Id as an Included Column.
What datatype is your Time column?
If it's datetime, try converting your '2015-04-10' into equivalent data-type, so that it can use the index.
Declare #test datetime
Set #test='2015-04-10'
Then modify your condition:
[Extent2].Time > #test
The sql server implicitly casts to matching data-type if there is a data-type mismatch. And any function or cast operation prevent using indexes.
I'm on the same track with #JonTirjan, the index with just Time results into a lot of key lookups, so you should try at least following:
create index xxx on Sms (Time, CompanyId) include (Id)
or
create index xxx on Sms (CompanyId, Time) include (Id)
If Id is your clustered index, then it's not needed in include clause. If significant part of your data belongs to CompanyID 4563, it might be ok to have it as include column too.
The percentages you see in actual plan are just estimates based on the row count assumptions, so those are sometimes totally wrong. Looking at actual number of rows / executions + statistics IO output should give you idea what's actually happening.
Two things come to mind:
By adding an extra restriction it will be 'harder' for the database to find the first 10 items that match your restrictions. Finding the first 10 rows from let's say 10.000 items (from a total of 1 milion) is a easier then finding the first 10 rows from maybe 100 items (from a total of 1 milion).
The index is not being used probably because the index is created on a datetime column, which is not very efficient if you are also storing the time in them. You might want to create a clustered index on the [time] column (but then you would have to remove the clustered index which is now on the [CompanyId] column or you could create a computed column which stores the date-part of the [time] column, create an index on this computed column and filter on this column.
I found out that there was no index on the foreign key column (SmsId) on the SplittedSms table. I made one and it seems the second query is almost as fast as the first one now.
The execution plan now:
Thanks everyone for the effort.

determine order of operations in query

Say I have a query like this:
SELECT *
FROM Foo
WHERE Name IN ('name1', 'name2')
AND (Date<'2013-01-01' AND Date>'2010-01-01')
AND Type = 1
Is there a way to force the SQL server to evaluate the expressions in the order I determine and not what the query optimizer says? For example I want the IN clause evaluated first, the output of that evaluated by Type = 1 and finally the dates, in EXACTLY that order.
Yes it is largely possible (though there are some caveats and counter examples discussed in the answers here)
SELECT *
FROM Foo
WHERE 1 = CASE
WHEN Name IN ( 'name1', 'name2' ) THEN
CASE
WHEN Type = 1 THEN
CASE
WHEN ( Date < '2013-01-01'
AND Date > '2010-01-01' ) THEN 1
END
END
END
But why bother? There are only very limited circumstances in which I can see this would be useful (e.g. preventing divide by zero if an earlier predicate evaluated to 0).
Wrapping the predicates up like this makes the query completely unsargable and prevents index usage for any of the three (otherwise sargable) predicates. It guarantees a full scan reading all rows.
To see an example of this
CREATE TABLE Foo
(
Id INT IDENTITY PRIMARY KEY,
Name VARCHAR(10),
[Date] DATE,
[Type] TINYINT,
Filler CHAR(8000) NULL
)
CREATE NONCLUSTERED INDEX IX_Name
ON Foo(Name)
CREATE NONCLUSTERED INDEX IX_Date
ON Foo(Date)
CREATE NONCLUSTERED INDEX IX_Type
ON Foo(Type)
INSERT INTO Foo
(Name,
[Date],
[Type])
SELECT TOP (100000) 'name' + CAST(0 + CRYPT_GEN_RANDOM(1) AS VARCHAR),
DATEADD(DAY, 7 * CRYPT_GEN_RANDOM(1), '2012-01-01'),
0 + CRYPT_GEN_RANDOM(1)
FROM master..spt_values v1,
master..spt_values v2
Then running the original query in the question vs this query gives plans
Note the second query is costed as being 100% of the cost of the batch.
The Query optimizer left to its own devices first seeks into the 414 rows matching the type predicate and uses that as a build input for the hash table. It then seeks into the 728 rows matching the name, sees if it matches anything in the hash table and for the 4 that do it performs a key lookup for the other columns and evaluates the Date predicate against those. Finally it returns the single matching row.
The second query just ploughs through all the rows in the table and evaluates the predicates in the desired order. The difference in number of pages read is pretty significant.
Original Query
Table 'Foo'. Scan count 3, logical reads 23,
Table 'Worktable'. Scan count 0, logical reads 0
Nested case
Table 'Foo'. Scan count 1, logical reads 100373
Short answer: NO!
You can try to use brackets, hints, study query plan, etc.
But is that wise to mess up with engine/optimizer that way?
You ill need a lot of study and experience to outsmart the optimizer, that said, please let the engine take care of that details for you.

Sybase query optimization

I'm seeing how we can improve the performance of the following sybase query. Currently it takes about 1.5 hrs.
CREATE TABLE #TempTable
(
T_ID numeric,
M_ID numeric,
M_USR_NAME char(10),
M_USR_GROUP char(10),
M_CMP_DATE datetime,
M_CMP_TIME numeric,
M_TYPE char(10),
M_ACTION char(15),
)
select
T.M_USR_NAME,
T.M_USR_GROUP,
T.M_CMP_DATE,
T.M_CMP_TIME,
T.M_TYPE,
T.M_ACTION
from #TempTable T, AUD_TN B
where T.M_ID=B.M_ID
and T.T_ID in
(
select M_NB from TRN H where (M_BENTITY ="KROP" or M_SENTITY = "KROP")
)
UNION
select
A.M_USR_NAME,
A.M_USR_GROUP,
A.M_DATE_CMP,
A.M_TIME_CMP,
A.M_TYPE,
A.M_ACTION
from AUD_VAL A, TRN H
where A.M_DATE_CMP >= '1 May 2012' and A.M_DATE_CMP <= '31 May 2012'
and A.M_ACT_NB0=H.M_NB
and (H.M_BENTITY ="KROP" or H.M_SENTITY = "KROP")
UNION
select
TR.M_USR_NAME,
TR.M_USR_GROUP,
TR.M_DATE_CMP,
TR.M_TIME_CMP,
TR.M_TYPE,
TR.M_ACTION
from TRN_AUD TR, TRN H
where TR.M_DATE_CMP >= '1 May 2012' and TR.M_DATE_CMP <= '31 May 2012'
and TR.M_ACT_NB0=H.M_NB
and (H.M_BENTITY ="KROP" or H.M_SENTITY = "KROP")
DROP table #TempTable
Any help is greatly appreciated. Please note the following
The only table which is not indexed above is AUD_TN
Cheers
RC
Presumably the temporary table is populated, and with a lot of rows?
The temp doesn't need to be indexed but all joins in that part will need to use indexes.
Why not try each part of the UNION separately to find if one of them's slow?
Are you okay using SET SHOWPLAN ON? I think you need to be able to do that as well probably - you need to be able to check that Sybase is using indexes to join right.
TRN BENTITY and SENTITY - indexed? If not your IN is going to be a bit slow, although it might be okay, doing a single table scan into a worktable that Sybase'll index internally. Use an EXISTS instead as well - that might/should work better.
2nd part - both have SARGS (look up in Sybooks if you don't know - search arguments.) I don't know what proportion of rows is found by them but assuming it's a small fraction, you should see an index used on a SARG for whichever table is scanned first, then you should see index join (or perhaps merge join) to the 2nd - but using indexes.
3rd part - similar discussion to the 2nd.
I reckon it'll be the 2nd or 3rd part
How about using cache for these tables. if the query is used kn a regular basis. Its better to get a named cache and bind the tables to it. Also bind the tempdb to cache. This will greatly improve the process execution time. If the temp table is huge then you can create a index on it which may help with performance but i need some more details for that.
If you still have this issue open :
1) Try this at top of sql batch
set showplan on
set noexec on
See if the expected indexes are being picked up by SQL optimizer. If no indexes exist on the columns in where clause, please create one. Create clustered index if possible.
2) In the first query you can replace the subquery in where clause with
create table #T_ID (
M_NB datatype
)
insert into #T_ID
select M_NB from TRN H where (M_BENTITY ="KROP" or M_SENTITY = "KROP")
and modify the where clause as :
where T.M_ID=B.M_ID
and T.T_ID = #T_ID.M_NB

Resources