Inefficent Query Plans SQL Server 2008 R2 - sql-server

Good Day,
We experience ongoing issues with our databases for which our internal DBA's are unable to explain.
Using the below query example:
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
Into
#sometemptable
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date >='7/1/2013'
And ActiveBalance > 0
And Discount <> '0' and discount is not null
This query will often run for an hour plus before I end up needing to kill it.
However, if I run the query as follows:
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
Into
#sometemptable
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date Between '7/1/2013' and '12/1/2013' --all of the dates are the first of the month
And ActiveBalance > 0
And Discount <> '0' and discount is not null
Followed by
Insert into #sometemptable
Select Distinct
Date,
AccountNumber,
Region,
Discount,
ActiveBalance
From
anothertable With (Index(ondate)) --use this or the query takes much longer
Where
Date Between '1/1/2014' and '6/1/2014' --all of the dates are the first of the month
And ActiveBalance > 0
And Discount <> '0' and discount is not null
I can run the query in less than 10 minutes. The particular tables I'm hitting are updated monthly. Stat updates are run on these tables both Monthly and weekly. Our DBA's, as mentioned before do not understand why the top query takes so much longer than the combination of the smaller queries.
Any ideas? Any suggestions would be greatly appreciated!
Thanks,
Ron

This is just a guess, but when you do Date >= '7/1/2013' sql will analyze how many rows it will approximatly return, and if the rows are greater then some internal threshold it will do a scan instead of a seek, thinking that there is enough data that it needs to return that a a scan will be faster.
When you do the between clause, sql server will do a seek because it knows it will not need to return the majority of rows that, that table has.
I assume that it is doing a table scan when you do the >= search. Once you post the Execution plans we will see for sure.

Related

Inner join query logic

I am using Sql Server Management studio.
I have the following table.
when i run the following query I get the running total for sales column
select s1.date,
sum(s2.sales)
from sales s1
join sales s2 on s1.date>=s2.date
group by s1.date;
but when i substitute s2.sales with s1.sales in the select
select s1.date,
**sum(s1.sales)**
from sales s1
join sales s2 on s1.date>=s2.date
group by s1.date;
it gives me a different answer can someone help me understand why i am facing this? since the sales column value should be the same.
The first version of your running total query is summing sales, for each dates, over dates which are strictly less than or equal to the date in each record. When you change s2.sales to s1.sales, you are then summing the current record's sales N number of times, where N is the number of records having an earlier date. This clearly is not the logic you want, so stick with the first version.
By the way, if you're using MySQL 8+, then analytic functions simplify things even further:
SELECT Date, Sales, SUM(Sales) OVER (ORDER BY Date) RunningSales
FROM sales
ORDER BY Date;

In T-SQL, Why Does a Aggregate On a Subquery Run Faster

The following 2 queries have the exact same execution plan. They go against the same heap table that has no indexes. I am only returning the top 5000 rows because this table is pretty huge.
Also, this table is read from 1000s of time a day and is refreshed nightly so I am pretty sure that the entire table is in memory.
Query 1 (Standard Aggregate, ran in 1:16)
SELECT TOP 5000 DB,
SourceID,
Date,
Units = SUM(Units),
WeightedUnits = SUM(WeightedUnits)
FROM dbo.Stats_Master_Charge_Details
GROUP BY DB,
SourceID,
Date
Query 2 (Subquery, ran in 1:11)
SELECT TOP 5000 x.DB,
x.SourceID,
x.Date,
Units = SUM(x.Units),
WeightedUnits = SUM(x.WeightedUnits)
FROM (SELECT DB,
SourceID,
Date,
Units,
WeightedUnits
FROM dbo.Stats_Master_Charge_Details) x
GROUP BY x.DB,
x.SourceID,
x.Date
Here is an image of the execution plan as well.
What am I missing here? Why would the subquery be faster? I would imagine the exact same results.

Running concurrent /parallel update statements (T-SQL)

I have a table that is basically records of items, with columns for each day of the month. So basically each row is ITEM , Day1, Day2, Day3, ....I have to run update statements that basically trawl through each row day by day with the current day information requiring some info from the previous day.
Basically, we have required daily quantities. Because the order goes out in boxes (which are a fixed size) and the calculated quantities are in pieces, the system has to calculate the next largest number of boxes. Any "extra quantity" is carried over to the next day to reduce boxes.
For example, for ONE of those records in the table described earlier (the box size is 100)
My current code is basically getting the record, calculate the requirements for that day, increment by one and repeat. I have to do this for each record. It's very inefficient especially since it's being run sequentially for each record.
Is there anyway to parallel-ize this on SQL Server Standard? I'm thinking of something like a buffer where I will submit each row as a job and the system basically manages the resources and runs the query
If the buffer idea is not feasible, is there anyway to 'chunk' these rows and run the chunks in parallel?
Not sure if this helps, but I played around with your data and was able to calculate the figures without row-by-row handling as such. I transposed the figures with unpivot and calculated the values using running total + lag, so this requires SQL Server 2012 or newer:
declare #BOX int = 100
; with C1 as (
SELECT
Day, Quantity
FROM
(SELECT * from Table1 where Type = 'Quantity') T1
UNPIVOT
(Quantity FOR Day IN (Day1, Day2, Day3, Day4)) AS up
),
C2 as (
select Day, Quantity,
sum(ceiling(convert(numeric(5,2), Quantity) / #BOX) * #BOX - Quantity)
over (order by Day asc) % #BOX as Extra
from C1
),
C3 as (
select
Day, Quantity,
Quantity - isnull(Lag(Extra) over (order by Day asc),0) as Required,
Extra
from C2
)
select
Day, Quantity, Required,
ceiling(convert(numeric(5,2), Required) / #BOX) as Boxes, Extra
from C3
Example in SQL Fiddle

Is there a quicker way of doing this type of query (finding inactive accounts)?

I have a very large table of wagering transactions. Let's say for the sake of the question I want to find the accounts of people who have wagered in the last year but not wagered in the last month, so I do something like this...
--query one
select accountnumber into #wageredrecently from activity
where _date >='2011-08-10' and transaction_type = 'Bet'
group by accountnumber
--query two
select accountnumber,firstname,lastname,email,sum(handle)
from activity a, customers c
where a.accountnumber = c.accountno
and transaction_type = 'Bet'
and _date >='2010-09-10'
and accountnumber not in (select * from #wageredrecently)
group by accountnumber,firstname,lastname,email
The problem is, this takes ages to get the data. Is there a quicker way to acheive the same in sql?
Edit, just to be specific about the time: It takes just over 3 minutes, which is far too long for a query that is destined for a php intranet page.
Edit (11/09/2011): I've found out that the problem is the customers table. It's actually a view. It previously had good performance but now all of a sudden its performance is terrible, a simple query on it takes almost as long as the above query pair. I have therefore chosen an alternative table of customer data (that actually is a table, and not a view) and now the query pair takes about 15 seconds.
You should try to join customers after you have found and aggregated the rows from activity (I assume that handle is a column in activity).
select c.accountno,
c.firstname,
c.lastname,
c.email,
a.sumhandle
from customers as c
inner join (
select accountnumber,
sum(handle) as sumhandle
from activity
where _date >= '2010-09-10' and
transaction_type = 'bet' and
accountnumber not in (
select accountnumber
from activity
where _date >= '2011-08-10' and
transaction_type = 'bet'
)
group by accountnumber
) as a
on c.accountno = a.accountnumber
I also included your first query as a sub-query instead. I'm not sure what that will do for performance. It could be better, it could be worse, you have to test on your data.
I don't know your exact business need, but rarely will someone need access to innactive accounts over several months at a moments notice. Depending on when you pruge data, this may get worse.
You could create an indexed view that contains the last transaction date for each account:
max(_date) as RecentTransaction
If this table gets too large, it could be partioned by year or month of the activity.
Have you considered adding an index on _date to the activity table? It's probably taking so long because it has to do a full table scan on that column when you're comparing the dates. Also, is transaction_type indexed as well? Otherwise, the other index wouldn't do you any good.
Answering my question as the problem wasn't the structure of the query but one of the tables being used. It was a view and its performance was terrible. I change to an actual table with customer data in and reduced the execution time down to about 15 seconds.

Count number of 'overlapping' rows in SQL Server

I've been asked to look at a database that records user login and logout activity - there's a column for login time and then another column to record logout, both in OLE format. I need to pull together some information about user concurrency - i.e. how many users were logged in at the same time each day.
Do anyone know how to do this in SQL? I don't really need to know the detail, just the count per day.
Thanks in advance.
Easiest way is to make a times_table from an auxiliary numbers table (by adding from 0 to 24 * 60 minutes to the base time) to get every time in a certain 24-hour period:
SELECT MAX(simul) FROM (
SELECT test_time
,COUNT(*) AS simul
FROM your_login_table
INNER JOIN times_table -- a table/view/subquery of all times during the day
ON your_login_table.login_time <= times_table.test_time AND times_table.test_time <= your_login_table.logout_time
GROUP BY test_time
) AS simul_users (test_time, simul)
I think this will work.
Select C.Day, Max(C.Concurrency) as MostConcurrentUsersByDay
FROM
(
SELECT convert(varchar(10),L1.StartTime,101) as day, count(*) as Concurrency
FROM login_table L1
INNER JOIN login_table L2
ON (L2.StartTime>=L1.StartTime AND L2.StartTime<=L1.EndTime) OR
(L2.EndTime>=L1.StartTime AND L2.EndTime<=L1.EndTime)
WHERE (L1.EndTime is not null) and L2.EndTime Is not null) AND (L1.ID<>L2.ID)
GROUP BY convert(varchar(10),L1.StartTime,101)
) as C
Group BY C.Day
Unchecked... but lose date values, count time between, use "end of day" for still logged in.
This assumes "logintime" is a date and a time. If not, the derived table can be removed (Still need ISNULL though). of course, SQL Server 2008 has "time" to make this easier too.
SELECT
COUNT(*)
FROM
(
SELECT
DATEADD(day, DATEDIFF(day, logintime, 0), logintime) AS inTimeOnly,
ISNULL(DATEADD(day, DATEDIFF(day, logouttime, 0), logintime), '1900-01-01 23:59:59.997') AS outTimeOnly
FROM
mytable
) foo
WHERE
inTimeOnly >= #TheTimeOnly AND outTimeOnly <= #TheTimeOnly

Resources