SQL View Optimization

SQL View Optimization - sql-server

I am trying to build a view that does basically 2 things, whether a record in table 1 is in table 2 and whether a link to another table is still there. it worked on a subset of data, but when i tried to run the full query it timed out in the view designer.
The view worked fine until I added in the check to see whether the link to another table was present.
Initially it joined table A to Table B and filtered out where A.ID wasnt present in the ID column in table B
I was then told that if the link between the person and the address table (stored in table C) was removed then we would have no way of knowing other than to get a full extract of that table again and see which links are no longer present. I am trying to use that check to determine whether to display some data in particular columns
I am using the following structure close to 60 times to choose whether to show information in a column:
Column1 = case when exists (select LinkID from LinkTable C
where cast(C.LinkAddressID as varchar) = A.AddressID
and cast(C.LinkID as varchar) = A.ID)
then Column1
else NULL
end
There is about 1.6m records in Table A just over 4m records in the Link table.
is there a better way to write this query / view that would be more optimized?
Please let me know if more information is needed

Select C.LinkID
From A
Left Join C On C.LinkAddressID = A.AddressID And C.LinkID = A.ID
This will give you C.LinkID if a match exists on the two conditions and NULL if both criteria are not satisfied.
Having indexes / keys such as primary key on A.ID and foreign key relationships based on what is in the join clause will provide very good performance.

As Joe suggested, if for all 60 columns you use the same AddressId and Id fields to match two tables, I believe so you can use something as following query
SELECT
Column1 = CASE WHEN C.LinkID IS NULL THEN NULL ELSE A.Column1 END,
....
FROM A
Left Join LinkTable C
ON C.LinkAddressID = A.AddressID AND C.LinkID = A.ID
Casting data types will definitely disable the advantage from index. So keep away data type cast if possible on joins and in WHERE clauses

Related

Rows 'not in' and rows 'in' does not equal total rows in table

I still have much to learn in database work, so please be kind.
I am attempting to combine two tables that have similar data, but wanted to be sure that I wasn't duplicating any entries. I decided to use the query below to see how many names were already in the target table
select A.Name
From SourceTable A
where Name NOT IN
(
select B.Name
From [Production].[dbo].[DestinationTable] B
)
This returned 0 rows, so I assumed that every Name was already in the target table. But when I changed the query to
select A.Name
From SourceTable A
where Name IN
(
select B.Name
From [Production].[dbo].[DestinationTable] B
)
I got back about half of the total rows in the source table. How can these two totals not add up to the total number of rows in the source table? I assumed duplicate names, but the numbers still don't add up. What could I be missing here?

Kamil's answer is a good explanation of what's going on with IN and NOT IN. But a better way to see if your destination table is missing any names from the source table would be to use a LEFT JOIN and check for NULL.
The query would look like this:
SELECT A.Name
FROM SourceTable A
LEFT JOIN [Production].[dbo].[DestinationTable] B ON A.Name = B.Name
WHERE B.Name IS NULL
This would return all names from your source that aren't in your destination.

The reason you are not getting the total row count from both queries combined is because you have NULL values in your DestinationTable.
Generally you are ommitting checking for null values and this is the reason. You could add OR name is null to see it.
Check it using
select count(*) from destinationtable where name is null
Alternatively you could perform a CROSS JOIN and see for yourself where the data doesn't match and inspect why

Is this join overcomplicated?

I have inherited an application made by a previous developer. Some of the database calls are running slow in places where there is a large amount of data. I have found in general the SQL code is well written but there are places that make me think, 'what the..?'
Here is one example:
select a.*
from bs_ResearchEnquiry a
left join bs_StateWorkflowState_Map b
on (
select c.MapId from bs_StateWorkflowState_Map c
where c.StateId = a.StateId AND c.StateWorkflowId = a.StateWorkflowId
)=b.MapId
where
b.IsFinal=1
The MapId field is a unique primary key to the bs_StateWorkflowState_Map table.
StateId and StateWorkflowId together also form a unique key.
There will always be a match on these keys to rows in the foreign table bs_ResearchEnquiry
Therefore, could I rewrite the left join more efficiently, and safely, as:
inner join bs_StateWorkflowState_Map b
on b.StateId = a.StateId AND b.StateWorkflowId = a.StateWorkflowId
Or was the original developer trying to achieve something I've missed ?

Your simplification looks good to me. Note that the presence of:
where b.IsFinal = 1
Means that the outer join is effectively inner join.

With your explanation on keys given, you are right, the query can be simplified. It selects records from bs_ResearchEnquiry where the associated bs_StateWorkflowState_Map record is final. So use EXISTS:
select *
from bs_ResearchEnquiry re
where exists
(
select *
from bs_StateWorkflowState_Map m
where m.StateId = re.StateId
and m.StateWorkflowId = re.StateWorkflowId
and m.IsFinal = 1
);
(From your explanation on uniqueness, I gather that there already exist indexes on (StateId, StateWorkflowId) in both tables. If not, create them.)

SQL query inside a query

Allow me to share my query in an informal way (not following the proper syntax) as I'm a newbie - my apologies:
select * from table where
(
(category = Clothes)
OR
(category = games)
)
AND
(
(Payment Method = Cash) OR (Credit Card)
)
This is one part from my query. The other part is that from the output of the above, I don’t want to show the records meeting these criteria:
Category = Clothes
Branch = B3 OR B4 OR B5
Customer = Chris
VIP Level = 2 OR 3 OR 4 OR 5
SQL is not part of my job but I’m doing it to ease things for me. So you can consider me a newbie. I searched online, maybe I missed the solution.
Thank you,
HimaTech

There's a few ways of doing this (specifically within SQL - not looking at MDX here).
Probably the easiest to understand way would be to get the dataset that you want to exclude as a subquery, and use the not exists/not in command.
SELECT * FROM table
WHERE category IN ('clothes', 'games')
AND payment_method IN ('cash', 'credit card')
AND id NOT IN (
-- this is the subquery containing the results to exclude
SELECT id FROM table
WHERE category = 'clothes' [AND/OR]
branch IN ('B3', 'B4', 'B5') [AND/OR]
customer = 'Chris' [AND/OR]
vip_level IN (2, 3, 4, 5)
)
Another way you could do it is to do left join the results you want to exclude on to the overall results, and exclude these results using IS NULL like so:
SELECT t1.*
FROM table
LEFT JOIN
(SELECT id FROM table
WHERE customer = 'chris' AND ...) -- results to exclude
AS t2 ON table.id = t2.id
WHERE t2.id IS NULL
AND ... -- any other criteria
The trick here is that when doing a left join, if there is no result from the join then the value is null. But this is certainly more difficult to get your head around.
There will also be different performance impacts from doing it either way, so it may be worth looking into it. This is probably a good place to start:
What's the difference between NOT EXISTS vs. NOT IN vs. LEFT JOIN WHERE IS NULL?

Searching for the unique key (not in meta)

With which SQL Server standard tool it is possible to search unique key in the table's data (but not in meta declaration)?
P.S. I am thinking to write such script by myself. May be you could point a snippet for
combinatorics in t-sql? e.g. for generation all Combinations from n by 1..n ?
P.P.S About problem complexity for those who do not see it. It is important that we do not need to analyze the whole data to dismiss the hypnotize that those two columns is the 'unique key'. With real world, 'report-like', sorted data even after analysing first two rows, I think, it is possible to remove many of columns combinations. So I feel such algorithm should have 'before full table compare' phase. But there it is a question for what portion of data to choose for this 'before full table compare' phase . The best candidate about which I think is the 'page'... If data unique in the page we could test the uniqueness on whole table, if not unique (on the page), then go to the next column set.

select t1.col, count(*)
from table t1
join table t2
on t1.col = t2.col
group by t1.col
having count(*) > 1
if zero rows are returned then it is unique
more than one column
select t1.cola, t1.colb, count(*)
from table t1
join table t2
on t1.cola = t2.cola
and t1.colb = t2.colb
group by t1.cola, t2.colb
having count(*) > 1

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!

I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.

One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?

This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight