sql server 2012 : how to optimize this like % query - sql-server

This query takes too much time, so I try to optimize it. Do you have any idea or suggestion ?
I tried with fulltext on a procedure and a while loop ... it gets worst ( dbo.url has more than 100 000 lines ; dbo.url where status = 'tocheck' only 1000)
select tocheck.*
from dbo.url tocheck inner join dbo.url done
on tocheck.id != done.id
and tocheck.url like done.url+'%'
and done.status in ('tocheck','todo','done')
where tocheck.status = 'tocheck'
Edit :
I call a webservice multiple times with different urls :
urls look like http://ws.com/query?p1=a&p2=b (url1).
If I already called url http://ws.com/query?p1=a (url2), i don't want to call url1 cause :
url1 like url2+'%'
Thanks for your help.
Edit2 :
I add a column suburl that contains 'query?p1=a' for each url and modify the query :
select tocheck.*
from dbo.url tocheck inner join dbo.url done
on tocheck.id != done.id
and tocheck.suburl = done.suburl --NEW
and tocheck.url like done.url+'%'
and done.status in ('tocheck','todo','done')
where tocheck.status = 'tocheck'
More than 10 times shorter ... Phew !!

I think because of joining the table to itself through ids not equal there is much overhead as this is a cartesian product only excluding self joins for same id.
I suggest trying with a subquery. Then the outer query returns only 1000 (as you mentioned) tochecks whereas the subquery additionally excludes urls starting with the same characters:
select
tocheck.*
from
dbo.url tocheck
where
tocheck.status = 'tocheck'
and
tocheck.id not in (
select
done.id
from
dbo.url done
where
tocheck.url like done.url+'%'
and
done.status in ('tocheck','todo','done')
)

Related

Peewee select query with multiple joins and multiple counts

I've been attempting to write a peewee select query which results in a table with 2 counts (one for the number of prizes associated with the lottery, and the for the number of packages associated with the lottery), as well as the fields in the Lottery model.
I've managed to write select queries with 1 count working (seen below), and then I've had to convert the ModelSelects to lists and join them manually (which I think is very hacky).
I did manage to write a select query where the results were joined, but it would multiply the packages count with the prizes count (I've since lost that query).
I also tried using a .switch(Lottery) but I didn't have any luck with this.
query1 = (Lottery.select(Lottery,fn.count(Package.id).alias('packages'))
.join(LotteryPackage)
.join(Package)
.order_by(Lottery.id)
.group_by(Lottery)
.dicts())
query2 = (Lottery.select(Lottery.id.alias('lotteryID'), fn.count(Prize.id).alias('prizes'))
.join(LotteryPrize)
.join(Prize)
.group_by(Lottery)
.order_by(Lottery.id)
.dicts())
lottery = list(query1)
query3 = list(query2)
for x in range(len(lottery)):
lottery[x]['prizes'] = query3[x]['prizes']
While the above code works, is there a cleaner way to write this query?
Your best bet is to do this with subqueries.
# Create query which gets lottery id and count of packages.
L1 = Lottery.alias()
subq1 = (L1
.select(L1.id, fn.COUNT(LotteryPackage.package).alias('packages'))
.join(LotteryPackage, JOIN.LEFT_OUTER)
.group_by(L1.id))
# Create query which gets lottery id and count of prizes.
L2 = Lottery.alias()
subq2 = (L2
.select(L2.id, fn.COUNT(LotteryPrize.prize).alias('prizes'))
.join(LotteryPrize, JOIN.LEFT_OUTER)
.group_by(L2.id))
# Select from lottery, joining on each subquery and returning
# the counts.
query = (Lottery
.select(Lottery, subq1.c.packages, subq2.c.prizes)
.join(subq1, on=(Lottery.id == subq1.c.id))
.join(subq2, on=(Lottery.id == subq2.c.id))
.order_by(Lottery.name))
for row in query.objects():
print(row.name, row.packages, row.prizes)

Updating one table's column in SQL Server from another

I have a table of measurements from weather stations, with station names (in Hebrew):
I also have created a table of those weather stations with their latitudes and longitudes:
I've written a query that should update the first table with the lat/longs from the second, but it's not working:
update t1
set t1.MeasurementLat = t2.Latitude,
t1.MeasurementLong = t2.Longitude
from [dbo].[Measurements] as t1
inner join [dbo].[StationCoords] as t2 on t1.StationName like t2.Station
I think there is a problem with the way the station name is being read, and perhaps something to do with encoding, because this query brings back an empty result, too:
SELECT TOP (5) *
FROM [dbo].[Measurements]
WHERE [StationName] = 'אריאל מכללה';
Any ideas?
Your example names are not the same. Perhaps this will work:
update m
set MeasurementLat = sc.Latitude,
MeasurementLong = sc.Longitude
from dbo.[Measurements] m join
dbo.[StationCoords] sc
on m.StationName like sc.Station + '%';

Microsoft SQL Server: wrong query execution plan taking too long

This is on Windows SQL Server Cluster.
Query is coming from 3rd party application so I can not modify the query permanently.
Query is:
DECLARE #FromBrCode INT = 1001
DECLARE #ToBrCode INT = 1637
DECLARE #Cdate DATE = '31-mar-2017'
SELECT
a.PrdCd, a.Name, SUM(b.Balance4) as Balance
FROM
D009021 a, D010014 b
WHERE
a.PrdCd = LTRIM(RTRIM(SUBSTRING(b.PrdAcctId, 1, 8)))
AND substring(b.PrdAcctId, 9, 24) = '000000000000000000000000'
AND a.LBrCode = b.LBrCode
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
AND b.CblDate = (SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate)
GROUP BY
a.PrdCd, a.Name
HAVING
SUM(b.Balance4) <> 0
ORDER BY
a.PrdCd
This particular query is taking too much time to complete execution. The same problem happens on a different SQL Server.
No table lock was found, processor and memory usage is normal while the query is running.
Normal "select top 1000" working and showing output instantly in both tables (D009021, D010014)
Reindex and rebuild / update stats done in both tables but problem did not resolve (D009021, D010014)
The same query is working if we reduce number of branch but slowly
(
DECLARE #FromBrCode INT =1001
DECLARE #ToBrCode INT =1001
)
The same query is working faster giving output within 2 mins if we replace any one variable and use the value directly
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
changed to
AND a.LBrCode BETWEEN 1001 AND #ToBrCode
The same query is working faster and giving output within 2 mins if we add "OPTION (RECOMPILE)" at end
I tried to clean cache query execution plan and optimized new one but problem still exists
Found that the query estimate plan and actual execution plan are different (see screenshots)
Table D010014 is aliased twice once as b and once as c
the they are joined to the same table.
Try toto remove the sub query below and create a temp table to store
the values you need. I added * to the fields you self join
SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
if you cant do that then try
SELECT TOP 1 c.CblDate
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
ORDER BY c.CblDate DESC

SQL Server LEFT JOIN

This query has been keeping me busy for the last couple of days. I tried to rewrite it with different ideas but I keep having the same problem. To simplify the problem I put part of my query in a view, this view returns 23 records. Using a left join I would like to add fields coming from the table tblDatPositionsCalc to these 23 records. As you can see I have an additional condition on the tblDatPositionsCalc in order to only consider the most recent records. With this condition it would return 21 records. The join should be on two fields together colAccount and colId.
I simply want the query to return the 23 records from the view and where possible have the information from tblDatPositionsCalc. There is actually only 2 records in the view without corresponding id and account in tblDatPositionsCalc, that means out of the 23 records only 2 will have missing values in the fields coming from the table tblDatPositionsCalc.
The problem with my query is that it only returns the 21 records from tblDatPositionsCalc. I don't understand why. I tried to move the condition on date in just after the JOIN condition but that did not help.
SELECT TOP (100) PERCENT
dbo.vwCurrPos.Account,
dbo.vwCurrPos.Id,
dbo.vwCurrPos.TickerBB,
dbo.vwCurrPos.colEquityCode,
dbo.vwCurrPos.colType,
dbo.vwCurrPos.colCcy,
dbo.vwCurrPos.colRegion,
dbo.vwCurrPos.colExchange,
dbo.vwCurrPos.[Instr Type],
dbo.vwCurrPos.colMinLastDay,
dbo.vwCurrPos.colTimeShift,
dbo.vwCurrPos.Strike,
dbo.vwCurrPos.colMultiplier,
dbo.vwCurrPos.colBetaVol,
dbo.vwCurrPos.colBetaEq,
dbo.vwCurrPos.colBetaFloor,
dbo.vwCurrPos.colBetaCurv,
dbo.vwCurrPos.colUndlVol,
dbo.vwCurrPos.colUndlEq,
dbo.vwCurrPos.colUndlFut,
tblDatPositionsCalc_1.colLots,
dbo.vwCurrPos.[Open Positions],
dbo.vwCurrPos.colListMatShift,
dbo.vwCurrPos.colStartTime,
tblDatPositionsCalc_1.colPrice,
tblDatPositionsCalc_1.colMktPrice,
dbo.vwCurrPos.colProduct,
dbo.vwCurrPos.colCalendar,
CAST(dbo.vwCurrPos.colExpiry AS DATETIME) AS colExpiry,
dbo.vwCurrPos.colEndTime,
CAST(tblDatPositionsCalc_1.colDate AS datetime) AS colDate,
dbo.vwCurrPos.colFund,
dbo.vwCurrPos.colExchangeTT,
dbo.vwCurrPos.colUserTag
FROM dbo.vwCurrPos
LEFT OUTER JOIN dbo.tblDatPositionsCalc AS tblDatPositionsCalc_1
ON tblDatPositionsCalc_1.colId = dbo.vwCurrPos.Id
AND tblDatPositionsCalc_1.colAccount = dbo.vwCurrPos.Account
WHERE (tblDatPositionsCalc_1.colDate =
(SELECT MAX(colDate) AS Expr1 FROM dbo.tblDatPositionsCalc))
ORDER BY
dbo.vwCurrPos.Account,
dbo.vwCurrPos.Id,
dbo.vwCurrPos.colEquityCode,
dbo.vwCurrPos.colRegion
Any idea what might cause the problem?
(Option 1) DrCopyPaste is right so your from clause would look like:
...
FROM dbo.vwCurrPos
LEFT OUTER JOIN dbo.tblDatPositionsCalc AS tblDatPositionsCalc_1
ON tblDatPositionsCalc_1.colId = dbo.vwCurrPos.Id
AND tblDatPositionsCalc_1.colAccount = dbo.vwCurrPos.Account
and (tblDatPositionsCalc_1.colDate =
(SELECT MAX(colDate) AS Expr1 FROM dbo.tblDatPositionsCalc))
...
reason: the where clause restriction of left joined to column = some expression with fail to return for "null = something" so the row will be removed.
(Option 2) As oppose to pushing code in to additional views where it is harder to maintain you can nest sql select statements;
select
X.x1,X.x2,
Y.*
from X
left join
(select Z.z1 as y1, Z.z2 as y2, Z.z3 as y3
from Z
where Z.z1 = (select max(Z.z1) from Z)
) as Y
on x.x1 = Y.y1 and X.x2 = Y.y2
The advantage here is you check each nested sub query a move out quickly. Although if you still building up more logic check out common table expressions (CTE's) http://msdn.microsoft.com/en-us/library/ms175972.aspx

mysql complex select query from multiple tables

Table Visits;
fields[id,patient_id(fk),doctor_id(fk),flag(Xfk),type(Xfk),time_booked,date,...]
Xfk = it refer to other table, but its not a must to exist so i dont use constrain.
SELECT `v`.`date`, `v`.`time_booked`, `v`.`stats`, `p`.`name` as pt_name,
`d`.`name` as dr_name, `f`.`name` as flag_name, `f`.`color` as flag_color,
`vt`.`name` as type, `vt`.`color` as type_color
FROM (`visits` v, `users` p, `users` d, `flags` f, `visit_types` vt)
WHERE `p`.`id`=`v`.`patient_id`
AND `d`.`id`=`v`.`doctor_id`
AND `v`.`flag`=`f`.`id`
AND `v`.`type`=`vt`.`id`
AND `v`.`date` >= '2013-02-27'
AND (v.date <= DATE_ADD('2013-02-27', INTERVAL 7 DAY))
AND (`v`.`doctor_id`='00002' OR `v`.`doctor_id`='00001')
ORDER BY `v`.`date` ASC, `v`.`time_booked` ASC;
One big statmeant i have !
my question is,
1: should i consider using join instead of select multiple tables ?
and if i should why ?
this query execution time is 0.0009 so i think its fine, and since i get all my data in one query, or is it bad practice ?
2: in the select part i want to say
if v.type != 0 select f.name,f.color else i dont want to select them nither there tables flags f
is it possible ?
also currently if flag was not found, it replicate all rows as much as flag table have in rows ! is there a way i can prevent this ? both for
flag and visit_types table ?
If it's running fast, I wouldn't mess with it. I generally prefer to use joins instead of matching stuff in the where clause.
Any chance you'd remove the ` characters? Just makes it a bit harder to read in my opinion.
Look at the case statement for MySQL: http://dev.mysql.com/doc/refman/5.0/en/case.html
select case when v.type <> 0 then
f.name
else
''
end as name, ...

Resources