OR in WHERE statement slowing things down dramatically

OR in WHERE statement slowing things down dramatically - sql-server

I have the following query that finds customers related to an order. I have a legacy ID on the customer so I have to check old id (legacy) and customer id hence the or statement
SELECT
c.Title,
c.Name
FROM productOrder po
INNER JOIN Employee e ON po.BookedBy = e.ID
CROSS APPLY (
SELECT TOP 1 *
FROM Customer c
WHERE(po.CustID = c.OldID OR po.CustID = c.CustID)
) c
GROUP BY
c.CustomerId, c.Title, c.FirstName, c.LastName
if I remove the OR statement it runs fine for both situations. There is an index on customer id and legacy.

For table customer, you need to create separate indexes on columns oldid and custid. If you already have clustered index on custid, then add index on oldid as well:
CREATE INDEX customer_oldid_idx ON customer(oldid);
Without this index, search for oldid in this clause:
WHERE (po.CustID = c.OldID OR po.CustID = c.CustID)
will have to use full table scan, and that will be super slow.

Related

Efficiently group query by one column, taking the maximum of another column and a third column that comes from the same row as the maximum column

I have a table of 100,000,000+ values, so efficiency is very important to me. I need to take information from table A, join it to an index table B, then join to table C using the index retrieved from table B. The problem is, there are multiple indexes for each value in table A, and I want to retrieve the one with the most recent date.
The query below creates duplicates:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM Table_1 t1
INNER JOIN Table_2 t2 ON t1.ID_1=t2.ID_1
INNER JOIN Table_3 t3 ON t2.ID_2=t3.ID_2
This one does not, but when running with more than 35,000 vs 40,000 elements, the execution time goes from <5sec to >1min:
SELECT ID_1, ID_2, Date
INTO #DEST_TABLE FROM
(SELECT * FROM Table_1 l CROSS APPLY Table_2 t2 WHERE t1.ID_1=t2.ID_1) t_temp
LEFT JOIN Table_3 t3 ON t_temp.ID_2=t3.ID_2
How can I decrease my execution time as much as possible?
Here is an example table:
For this table, I would be trying to get the most recent location for each person.
None of the columns are indexed and I cannot create indexes on this table.

First of all, when you are working on 100 Million+ records and that
too joining to other tables, first thing I would ask is what is the
rationale behind not creating indexes which can cover your query. If
you are not the admin of that system, I would suggest that you
should bring this up to admin group and try to understand what is
the exact reason (if any) they do not want index on that huge table.
Specially because you mentioned "efficiency is very important to
me".
Remember that 'SQL Tuning' is only one of the steps of 'Database Performance Tuning' and you can tune only as much with writing a good SQL Query. When the data volume gets huge, a good SQL Query is never sufficient without taking other Performance Tuning Measures.
Apart from what Roger has already provided, here are a few solutions that you can try out:
Solution 1
SELECT T1.ID_1, OA.ID_2, OA.Location
FROM Table1 T1
OUTER APPLY (
SELECT TOP 1 T3.ID_2, T3.Location
FROM Table2 T2
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2
WHERE T2.ID_1 = T1.ID_1
ORDER BY T3.Date DESC
) OA;
Solution 2:
SELECT DISTINCT
T1.ID_1
,T2.ID_2
,Location = FIRST_VALUE(T3.Location) OVER (PARTITION BY T1.ID_1 ORDER BY T3.Date DESC)
FROM Table1 T1
INNER JOIN Table2 T2
ON T1.ID_1 = T2.ID_1
INNER JOIN Table3 T3
ON T2.ID_2 = T3.ID_2;
Data Preparation:
DROP TABLE IF EXISTS Table1
DROP TABLE IF EXISTS Table2
DROP TABLE IF EXISTS Table3
SELECT TOP 10000 ID_1 = object_id, name
INTO Table1
FROM sys.all_objects
ORDER BY object_id
SELECT ID_1 = T1.ID_1, ID_2 = IDENTITY(INT, 1, 1)
INTO Table2
FROM Table1 T1
CROSS JOIN Table1 T2
SELECT ID_2, Location = 'City_'+ CAST(ID_2 AS VARCHAR(100)), Date = CAST(DATEADD(DAY, ID_2/10000, GETDATE()) AS DATE)
INTO Table3
FROM Table2
Indexes to cover the Solution 1:
CREATE NONCLUSTERED INDEX IX_TABLE1_ID_1 ON Table1 (ID_1)
CREATE NONCLUSTERED INDEX IX_TABLE2_ID_2 ON Table2 (ID_1, ID_2)
CREATE NONCLUSTERED INDEX IX_TABLE3_ID_2 ON Table3 (ID_2, Date DESC) INCLUDE (Location)
Execution Plan:
You can see that all are 'Index Seek' except for Table1 which is an legitimate 'Index Scan' because you are doing scans for each value of Table1's ID_1 value. If you put a where clause in the outer loop to search for a few specific ID_1 values, then that 'Index Scan' will turn to a 'Index Seek' as well.
I will leave the Index Strategy for the 2nd solution to you (as a homework :) ). Tips: You have to make the Location as a key as well. Or you can go with COLUMNSTORE index approach.

You can use something like this:
select top (1) with ties
a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = it.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
order by row_number() over(partition by a.A_Id order by b.Date desc);
Alternatively, you can try an olde fashioneth approache:
select a.A_Id, b.B_Id, b.Date
from dbo.TableA a
inner join dbo.TableB b on a.A_Id = b.A_Id
inner join dbo.TableC c on c.B_Id = b.B_Id
where not exists (
select 0 from dbo.TableB pb where pb.B_Id = b.B_Id and pb.Date > b.Date
);
However, as with all such situations, its performance will heavily depend on indices. SSMS can suggest you some, if you will look at the execution plan; off the top of my head, you will need all Id columns to be indexed, and you will need either a single (Date) or a composite (A_Id, Date, B_Id) on the TableB.
UPD: If you can't create or modify any indices, and performance is paramount, I would suggest copying the data in question into a separate schema or database, where you might have appropriate permissions. Apart from that... it's impossible to get something out of nothing.

How to do sub-query correctly while selecting two tables in Oracle?

I need to do a sub-query from a table to find all employees working in the same department that is part of the same city, but I'm not getting it.
I have the following tables:
Table departments
DEPARTMENTS
department_id
department_name
location_id
Table locations
LOCATIONS
location_id
street_address
postal_code
city
state_province
country_id
Table employees
EMPLOYEES
employee_id
first_name
last_name
email
phone_number
hire_date
job_id
department_id
My code right now is something like that :
SELECT
firt_name,
department_id,
job_id
FROM employees
WHERE state_province = (SELECT state_province FROM locations
WHERE state_province = 'Sao Paulo');
The problem is that while I want to select state_province from the table locations, I can't select the name, department id and job id from the table employees. How can I select both tables while doing the sub-query ?
Anyway, sorry if I did something wrong in the code, I am new to sub-queries.

You could try doing a join between the two tables instead:
SELECT
e.firt_name,
e.department_id,
e.job_id,
l.* -- replace with columns you really want
FROM employees e
INNER JOIN locations l
ON e.state_province = l.state_province
WHERE
e.state_province = 'Sao Paulo';
I don't know which columns you want to select from locations, but it doesn't really make sense to do a join just for state_province alone, as the employees table already has this column. So I just included location.* as a placeholder which you can replace with the columns you actually want.
Edit:
A join is the way to go here IMO, but if you absolutely need to use a subquery, then you can move your current subquery from the WHERE clause to the SELECT clause:
SELECT
firt_name,
department_id,
job_id,
(SELECT l.state_province FROM locations l
WHERE e.state_province = l.state_province) state_province
FROM employees e;
Note that this will only work if there is one matching province. For this and performance reasons, my join query is probably what you would want to use in practice.

I think in your case, sub-query may not be necessary.
A join table can do the trick.
SELECT e.first_name, e.department_id, e.job_id, l.state_province
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
LEFT JOIN locations l ON d.location_id = l.location_id
WHERE l.state_province = 'Sao Paulo';

SQL SELECT from SELECT

I am trying to build a single select statement from two separate ones.
Basically I have a list of Names in a table which do repeat like so:
Name| Date
John 2014-11-22
John 2013-02-03
Joe 2012-12-12
Jack 2011-11-11
Bob 2010-10-01
Bob 2013-12-22
I need to do a Select distinct Name from Records which returns John, Joe, Jack, Bob.
I then want to so a Select on another table where I pass in the rows returned above.
SELECT Address, Phone From dbo.Details
WHERE Name = {Values from first SELECT query}
Having trouble with the syntax.

If you do not want to return any values from the subquery, you can use either IN or EXISTS
SELECT Address, Phone From dbo.Details
WHERE Name IN (SELECT DISTINCT Name FROM Records)
-- OR --
SELECT Address, Phone From dbo.Details D
WHERE EXISTS (SELECT 1 FROM Records R WHERE R.Name = D.Name)
(In most RDBMS the EXISTS is less resource intensive).
If you want to return values from the subquery, you should use JOIN
SELECT
D.Address,
D.Phone,
R.Name -- For example
FROM
dbo.Details D
INNER JOIN dbo.Records R
ON D.Name = R.Name
SIDENOTE These are sample queries, it is possible that you have to fine tune them to match your exact requirements.

You can use:
SELECT Address, Phone, name
FROM details
-- "in" is the difference from your first query, needed due to multiple values being returned by the subquery
WHERE name in (
SELECT distinct name
FROM namesTable
)
Additionally the following should work:
SELECT d.Address, d.Phone, n.name
FROM details d
inner join (
select distinct name
from namesTable
) n on d.name = n.name

So there are two ways you can go about doing this. One, create a temporary table and perform a join (*actually in retrospect you could also join to your second table as a subquery, or use something like a CTE if you're using SQL SERVER, but the modifications if you wanted to go that route should be pretty obvious)
CREATE TEMPORARY TABLE my_table AS
{your first select query};
SELECT Address, Phone From dbo.Details
INNER JOIN my_table AS mt
ON mt.name = dbo.name
Another option would be to perform an IN or EXISTS query using your select query
SELECT Address, Phone From dbo.Details
WHERE name IN (SELECT name from my_table)
Or, better yet (eg SQL Server IN vs. EXISTS Performance),
SELECT Address, Phone From dbo.Details
WHERE EXISTS (SELECT * from my_table WHERE my_table.name = dbo.name)
You might have to modify the syntax slightly, depending on if you are using MySQL or SQL Server (not sure about that later, honestly). But this should get you started down the right path

This will give you the names and their address and phone number:
SELECT DISTINCT N.Name, D.Address, D.Phone
FROM dbo.Details D INNER JOIN dbo.Names N ON D.Name = N.Name

When using a subquery that is not scalar (doesn't return only one value) in the where clause use IN and of course only one column in the subquery:
SELECT Address, Phone
From dbo.Details
WHERE Name IN (Select Name from Table)

Need help creating a query for a non-normalized database

I've never worked with a non-normalized database before, so I'll try and explain my problem as best I can. So I have two tables:
The customers table holds all the customers information, and the orders table holds all the orders that they have placed. I haven't listed all the fields in the tables, just the ones that I need. The customer number in both tables is not the primary key, but I'm inner joining on them anyway. So the problem I'm having is that I don't know how to make a query that:
Selects all the customers with their first name, last name, and email, and also show the most recent orderdate, most recent total, and most recent ordertype. I know that I have to use a max() aggregate for the date, but that's as far as I got. Please help a noob out.

You can try:
SELECT FirstName,
LastName,
Email,
OrderDate,
OrderTotal,
OrderType
FROM Customers AS C
INNER JOIN Order AS O
ON O.CustomerNumber = C.CustomerNumber AND
O.OrderDate = (
SELECT MAX (O1.OrderDate)
FROM Order AS O1
WHERE O1.CustomerNumber = C.CustomerNumber)
)

assuming that Orders.OrderDate is unique for each CustomerNumber, does this work for you? if a single CustomerNumber has more than one entry in Order for OrderDate, you'll get each of those rows.
select c.FirstName, c.LastName, c.Email, o.OrderDate, o.OrderTotal, o.OrderType
from Customers c
join
(select CusomterNumber, max(OrderDate) as MostRecentOrderDate
from Orders
group by CustomerNumber
) mro on mro.CustomerNumber=s.CustomerNumber
join Orders o on o.OrderDate=mro.MostRecentOrdeDate and
o.CustomerNumber=mro.CustomerNumber

Try this:
SELECT
Customers.*, Orders.*
FROM
Customers
JOIN
(SELECT
Customer_Number,
MAX(Order_Date) OrderDate
FROM
Orders
GROUP BY
Customer_Number
) as Ord ON Customers.Customer_Number = Ord.Customer_Number
JOIN Order ON Orders.Customer_Number = Ord.Customer_Number

If you are doing this with SQL Server use the query designer and basically all you want to do is do a join since you have two keys that are the same one in Customer Table ->Customer Join on Order->Customer alias the Customer table as C and Orders table as O
so for example
SELECT Customer.*, Orders.*
From Customer c, Orders O INNER JOIN O where C.Customer Number = O.Customer Number
This should be enough to get you started.. if you don't want all the fields then fully qualify the names for example
SELECT C.FirstName, C.LastName, O.OrderDate, O.OrderType FROM Customer C, Orders O
WHERE C.Customer NUmber = O.Customer Number //this is another way of doing a Join when working with the where Clause.

FreeText COUNT query on multiple tables is super slow

I have two tables:
**Product**
ID
Name
SKU
**Brand**
ID
Name
Product table has about 120K records
Brand table has 30K records
I need to find count of all the products with name and brand matching a specific keyword.
I use freetext 'contains' like this:
SELECT count(*)
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE (contains(Product.Name, 'pants')
or
contains(Brand.Name, 'pants'))
This query takes about 17 secs.
I rebuilt the FreeText index before running this query.
If I only check for Product.Name. They query is less then 1 sec. Same, if I only check the Brand.Name. The issue occurs if I use OR condition.
If I switch query to use LIKE:
SELECT count(*)
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE Product.Name LIKE '%pants%'
or
Brand.Name LIKE '%pants%'
It takes 1 secs.
I read on MSDN that: http://msdn.microsoft.com/en-us/library/ms187787.aspx
To search on multiple tables, use a
joined table in your FROM clause to
search on a result set that is the
product of two or more tables.
So I added an INNER JOINED table to FROM:
SELECT count(*)
FROM (select Product.Name ProductName, Product.SKU ProductSKU, Brand.Name as BrandName FROM Product
inner join Brand
on product.BrandID = Brand.ID) as TempTable
WHERE
contains(TempTable.ProductName, 'pants')
or
contains(TempTable.BrandName, 'pants')
This results in error:
Cannot use a CONTAINS or FREETEXT predicate on column 'ProductName' because it is not full-text indexed.
So the question is - why OR condition could be causing such as slow query?

After a bit of trial an error I found a solution that seems to work. It involves creating an indexed view:
CREATE VIEW [dbo].[vw_ProductBrand]
WITH SCHEMABINDING
AS
SELECT dbo.Product.ID, dbo.Product.Name, dbo.Product.SKU, dbo.Brand.Name AS BrandName
FROM dbo.Product INNER JOIN
dbo.Brand ON dbo.Product.BrandID = dbo.Brand.ID
GO
CREATE UNIQUE CLUSTERED INDEX IX_VW_PRODUCTBRAND_ID
ON vw_ProductBrand (ID);
GO
If I run the following query:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
SELECT count(*)
FROM Product
inner join vw_ProductBrand
on Product.BrandID = vw_ProductBrand.ID
WHERE (contains(vw_ProductBrand.Name, 'pants')
or
contains( vw_ProductBrand.BrandName, 'pants'))
It now takes 1 sec again.

I ran into a similar problem but i fixed it with union, something like:
SELECT *
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE contains(Product.Name, 'pants')
UNION
SELECT *
FROM Product
inner join Brand
on Product.BrandID = Brand.ID
WHERE contains(Brand.Name, 'pants'))

Have you tried something like:
SELECT count(*)
FROM Product
INNER JOIN Brand ON Product.BrandID = Brand.ID
WHERE CONTAINS((Product.Name, Brand.Name), 'pants')

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

OR in WHERE statement slowing things down dramatically - sql-server

Related

Efficiently group query by one column, taking the maximum of another column and a third column that comes from the same row as the maximum column

How to do sub-query correctly while selecting two tables in Oracle?

SQL SELECT from SELECT

Need help creating a query for a non-normalized database

FreeText COUNT query on multiple tables is super slow

Categories

Resources