Find Child with Parent having specific information - sql-server

I am trying to find children whose parent have some specific information from different relational tables.
I have four tables as shown below
Search Criteria : Get all the "Section" who has parent as "Inventory" level with attached User name containing 'a' letter and role id is 'employee' (Please see LevelsUser table for relation).
I tried CTE (common table expression') approach to find the correct Section level but here I have to pass level Id as hard coded value and I cannot search all Section in the table.
WITH LevelsTree AS
(
SELECT Id, ParentLevelId, Level
FROM Levels
WHERE Level='Section' // here i need to pass value
UNION ALL
SELECT ls.Id, ls.ParentLevelId, ls.Level
FROM Levels ls
JOIN LevelsTree lt ON ls.Id = lt.ParentLevelId
)
SELECT * FROM LevelsTree
I need to find all sections match the above criteria.
Please help me here.

For hierarchical checks you need to select from and then join to the same table Levels. So something like this should help you:
declare #parentLevelName varchar(20) = 'Inventory';
with cte as (
select distinct
l1.id,
l1.Level
from Levels l1
join Levels l2 on l2.id=l1.ParentLevelId
and l2.Level = #parentLevelName -- use variable instead of hardcoded `Inventory`
where l1.Level='Section' -- replace `Section` with #var containing your value
) select * from cte
join LevelUsers lu on lu.LevelId=cte.id
join Users u on u.Id = lu.UserId
and u.UserName like '%a%' -- this letter check is not efficient
join Role r on r.id=lu.RoleId and r.Role='employee'
Note, the above query selects data only from the 4 tables which you have described in DB schema. However, you original query contains a reference to the HierarchyPosition table which you haven't described. If you really need to include HiearchyPosition reference then specify how it relates to the other 4 tables.
Also note, condition and u.UserName like '%a%' used to satisfy your requirement of User name containing 'a' letter is not efficient because of the leading %, which prevents the use of indexes. Consider changing your requirements if possible to User name starts with 'a' letter. This way and u.UserName like 'a%' will allow the use of index over Users table if it exists.
HTH

Related

SQL View Optimization

I am trying to build a view that does basically 2 things, whether a record in table 1 is in table 2 and whether a link to another table is still there. it worked on a subset of data, but when i tried to run the full query it timed out in the view designer.
The view worked fine until I added in the check to see whether the link to another table was present.
Initially it joined table A to Table B and filtered out where A.ID wasnt present in the ID column in table B
I was then told that if the link between the person and the address table (stored in table C) was removed then we would have no way of knowing other than to get a full extract of that table again and see which links are no longer present. I am trying to use that check to determine whether to display some data in particular columns
I am using the following structure close to 60 times to choose whether to show information in a column:
Column1 = case when exists (select LinkID from LinkTable C
where cast(C.LinkAddressID as varchar) = A.AddressID
and cast(C.LinkID as varchar) = A.ID)
then Column1
else NULL
end
There is about 1.6m records in Table A just over 4m records in the Link table.
is there a better way to write this query / view that would be more optimized?
Please let me know if more information is needed
Select C.LinkID
From A
Left Join C On C.LinkAddressID = A.AddressID And C.LinkID = A.ID
This will give you C.LinkID if a match exists on the two conditions and NULL if both criteria are not satisfied.
Having indexes / keys such as primary key on A.ID and foreign key relationships based on what is in the join clause will provide very good performance.
As Joe suggested, if for all 60 columns you use the same AddressId and Id fields to match two tables, I believe so you can use something as following query
SELECT
Column1 = CASE WHEN C.LinkID IS NULL THEN NULL ELSE A.Column1 END,
....
FROM A
Left Join LinkTable C
ON C.LinkAddressID = A.AddressID AND C.LinkID = A.ID
Casting data types will definitely disable the advantage from index. So keep away data type cast if possible on joins and in WHERE clauses

Full-text Search on documents and related data mssql

Currently in the middle of building a knowledge base app and am a bit unsure on the best way to store and index the document information.
The user uploads the document and when doing so selects a number of options from dropdown lists (such as category,topic,area..., note these are not all mandatory) they also enter some keywords and a description of the document. At the moment the category (and others) selected is stored as foreign key in the documents table using the id from the categories table.
What we want to be able to do is do a FREETEXTTABLE or CONTAINSTABLE on not only the information within the varchar(max) column where the document is located but also on the category name, topic name and area name etc.
I looked at the option of creating an indexed view but this wasn't possible due to the LEFT JOIN against the category column. So I'm not sure how to go about being able to do this any ideas would be most appreciated.
I assume that you want to AND the two searches together. For example find all documents containing the text "foo" AND in category the "Automotive Repair".
Perhaps you don't need to full text the additional data and can just use = or like? If the additional data is reasonably small it may not warrant the complication of full text.
However, if you want to use full text on both, use a stored procedure that pulls the results together for you. The trick here is to stage the results rather than trying to get a result set back straight away.
This is rough starting point.
-- a staging table variable for the document results
declare #documentResults table (
Id int,
Rank int
)
insert into #documentResults
select d.Id, results.[rank]
from containstable (documents, (text), '"foo*"') results
inner join documents d on results.[key] = d.Id
-- now you have all of the primary keys that match the search criteria
-- whittle this list down to only include keys that are in the correct categories
-- a staging table variable for each the metadata results
declare #categories table (
Id int
)
insert into #categories
select results.[KEY]
from containstable (Categories, (Category), '"Automotive Repair*"') results
declare #topics table (
Id int
)
insert into #topics
select results.[KEY]
from containstable (Topics, (Topic), '"Automotive Repair*"') results
declare #areas table (
Id int
)
insert into #areas
select results.[KEY]
from containstable (Areas, (Area), '"Automotive Repair*"') results
select d.text, c.category, t.topic, a.area
from #results r
inner join documents d on d.Id = r.Id
inner join #categories c on c.Id = d.CategoryId
inner join #topics t on t.Id = d.TopicId
inner join #areas a on a.Id = d.AreaId
You could create a new column for your full text index which would contain the original document plus the categories appended as metadata. Then a search on that column could search both the document and the categories simultaneously. You'd need to invent a tagging system that would keep them unique within your document yet the tags would not be likely to be used as search phrases themselves. Perhaps something like:
This is my regular document text. <FTCategory: Automotive Repair> <FTCategory: Transmissions>

Search in multiple tables with Full-Text

I'm trying to make a detailed search with asp and SQL Server Full-text.
When a keyword submitted, I need to search in multiple tables. For example,
Table - Members
member_id
contact_name
Table - Education
member_id
school_name
My query;
select mem.member_id, mem.contact_name, edu.member_id, edu.school_name from Members mem FULL OUTER JOIN Education edu on edu.member_id=mem.member_id where CONTAINS (mem.contact_name, '""*"&keyword&"*""') or CONTAINS (edu.school_name, '""*"&keyword&"*""') order by mem.member_id desc;
This query works but it takes really long time to execute.
Image that the keyword is Phill; If mem.contact_name matches then list it, or if edu.school_name matches, list the ones whose education match the keyword.
I hope I could explain well :) Sorry for my english though.
Perhaps try an indexed view containing the merged dataset- you can add the fulltext index there instead of the individual tables, and it's further extensible to as many tables as you need down the line. Only trick, of course, is the space...
This is what I would do for my multi table full text search.
Not exact but it will give basic idea. the key thing is to give table vise contain with OR condition.
DECLARE #SearchTerm NVARCHAR(250)
SET #SearchTerm = '"Texas*"'
SELECT * FROM table1
JOIN table2 on table1.Id = table2.FKID
WHERE (
(#SearchTerm = '""') OR
CONTAINS((table1.column1, table1.column2, table1.column3), #SearchTerm) OR
CONTAINS((table2.column1, table2.column2), #SearchTerm)
)
Couple of points I don't understand that will be affecting your speed.
Do you really need a full outer join? That's killing you. It looks like these tables are one to one. In that case can't you make it an inner join?
Can't you pass a column list to contains like so:
SELECT mem.member_id,
mem.contact_name,
edu.member_id,
edu.school_name
FROM members mem
INNER JOIN education edu ON edu.member_id = mem.member_id
WHERE Contains((mem.contact_name,edu.school_name),'*keyword*')
ORDER BY mem.member_id DESC
Further info about contains.

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!
I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.
One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?
This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

SQL Server 2005 Performance: Distinct or full table in WHERE IN statement

We have two Tables:
Document: id, title, document_type_id, showon_id
DocumentType: id, name
Relationship: DocumentType hasMany Documents. (Document.document_type_id = DocumentType.id)
We wish to retrieve a list of all document types for one given ShowOn_Id.
We see two possiblities:
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42
);
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT Document.document_type_id FROM Document WHERE showon_id = 42
);
Our question is: when and if is it better to use the DISTINCT to get the smaller record set versus retrieving the whole table and the IN statement walking the table to the first match. (We guess that's what it does ;-))
Is this different for different databases, is there a common answer?
Or is there a better way of doing it? (We are in .NET land)
You can use a join:
SELECT DISTINCT DocumentType.*
FROM DocumentType
INNER JOIN Document
ON DocumentType.id=Document.document_type_id
WHERE Document.showon_id = 42
I think it's the best way to do it.
For the best performance you should use:
SELECT DISTINCT dt.*
FROM
DocumentType dt
INNER JOIN Document d ON dt.id=d.document_type_id and d.showon_id = 42
Joins are very efficient at bridging multiple tables where as the nested query in the Where clause will need to perform a separate result selection that will filter down the From clause results. The join statement is also much more readable.
I would also put an index on showon_id, in addition to the primary keys and foreign key relationship.
My answer differs from wmasm's answer only by moving the showon_id filter up to the inner join. For MS SQL 2k5, I think the interpreter is smart enough to do this automatically, but you always want to work with the smallest result set possible. Bringing your filters up to inner join statements can limit the number of rows the query has to work with when joining many tables together. If you do this though, you should understand that this happens for every row comparison so complex filters (such as like x = '%a' or function calls) are better left for the Where clause so that the inner joins may filter out unnecessary comparisons.
Use an EXISTS. It sometimes is faster, but in my opinion, more readable than a DISTINCT and JOIN. Just for kicks, pls reply with the query plan for this query and the JOIN above, and see if anything is different (they may be optimized down to the same plan). If they are the same, I'd recommend the EXISTS as it is closer to a "plain language" description than a JOIN (because you don't want any of the data from Document, etc.)
SELECT whatever
FROM DocumentType dt
WHERE EXISTS( SELECT *
FROM Document
WHERE dt.id = document_type_id
AND showon_id = 42)
To get the query plan (ref: http://msdn.microsoft.com/en-us/library/ms180765(SQL.90).aspx), do:
SET SHOWPLAN_TEXT ON
GO
SELECT ...
GO
From my point of view it should not make any difference inside SQL Server (but who knows how this is implemented).
Think of it this way: to return the resultset the server needs to go into the Document table and retrieve all document_type_id WHERE showon_id = 42. In the process of retrieving the document_type_ids (e.g. by index seeking) it puts them into a hash table. When this process has finished the hash table will contain distinct values anyway. After that the query execution goes inside the Document_Type table, scans the primary key and probes into the hash table. Note that this depends, e.g. maybe it's more efficient to not use a hash table, when the expected row count from the Document table it low compared to Document_Type, but in general you get the same query plan as for the query wmasm just suggested.
Follow up on Matt's answer:
I've enabled the query plan and tested the following four different queries that have come up so far:
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DISTINCT DocumentType.* FROM DocumentType INNER JOIN Document ON DocumentType.id=Document.document_type_id WHERE Document.showon_id = 42;
SELECT DocumentType.* FROM DocumentType WHERE EXISTS ( SELECT * FROM Document WHERE DocumentType.id=Document.document_type_id AND showon_id = 42);
The query plan for all four queries turned out to be the same:
|--Hash Match(Right Semi Join, HASH:([Document].[document_type_id])=([DocumentType].[Id]))
|--Hash Match(Inner Join, HASH:([Document].[Title], [Uniq1005])=([Document].[Title], [Uniq1005]), RESIDUAL:([Document].[Title] as [Document].[Title] = [Document].[Title] as [Document].[Title] AND [Uniq1005] = [Uniq1005]))
| |--Index Seek(OBJECT:([Document].[IX_Document_3] AS [Document]), SEEK:([Document].[showon_id]=(1)) ORDERED FORWARD)
| |--Index Scan(OBJECT:([Document].[IX_Document_1] AS [Document]))
|--Table Scan(OBJECT:([DocumentType] AS [DocumentType]))
I am not sure what every line and element means, but it seems that from the performance perspective it does not matter how you construct the query for this kind of problem...

Resources