Search in multiple tables with Full-Text

Search in multiple tables with Full-Text - sql-server

I'm trying to make a detailed search with asp and SQL Server Full-text.
When a keyword submitted, I need to search in multiple tables. For example,
Table - Members
member_id
contact_name
Table - Education
member_id
school_name
My query;
select mem.member_id, mem.contact_name, edu.member_id, edu.school_name from Members mem FULL OUTER JOIN Education edu on edu.member_id=mem.member_id where CONTAINS (mem.contact_name, '""*"&keyword&"*""') or CONTAINS (edu.school_name, '""*"&keyword&"*""') order by mem.member_id desc;
This query works but it takes really long time to execute.
Image that the keyword is Phill; If mem.contact_name matches then list it, or if edu.school_name matches, list the ones whose education match the keyword.
I hope I could explain well :) Sorry for my english though.

Perhaps try an indexed view containing the merged dataset- you can add the fulltext index there instead of the individual tables, and it's further extensible to as many tables as you need down the line. Only trick, of course, is the space...

This is what I would do for my multi table full text search.
Not exact but it will give basic idea. the key thing is to give table vise contain with OR condition.
DECLARE #SearchTerm NVARCHAR(250)
SET #SearchTerm = '"Texas*"'
SELECT * FROM table1
JOIN table2 on table1.Id = table2.FKID
WHERE (
(#SearchTerm = '""') OR
CONTAINS((table1.column1, table1.column2, table1.column3), #SearchTerm) OR
CONTAINS((table2.column1, table2.column2), #SearchTerm)
)

Couple of points I don't understand that will be affecting your speed.
Do you really need a full outer join? That's killing you. It looks like these tables are one to one. In that case can't you make it an inner join?
Can't you pass a column list to contains like so:
SELECT mem.member_id,
mem.contact_name,
edu.member_id,
edu.school_name
FROM members mem
INNER JOIN education edu ON edu.member_id = mem.member_id
WHERE Contains((mem.contact_name,edu.school_name),'*keyword*')
ORDER BY mem.member_id DESC
Further info about contains.

Related

Is there an equivalent to OR clause in CONTAINSTABLE - FULL TEXT INDEX

I am trying to find a solution in order to improve the String searching process and I selected FULL-TEXT INDEX Strategy.
However, after implementing it, I still can see there is a performance hit when it comes to search by using multiple strings using multiple Full-Text Index tables with OR clauses.
(E.x. WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%'))
As a solution, I am trying to use CONTAINSTABLE expecting a performance improvement.
Now, I am facing an issue with CONTAINSTABLE when it comes to joining tables with a LEFT JOIN
Please go through the example below.
Query 1
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
LEFT JOIN CONTAINSTABLE(P.Building,*,'%John%') AS FFTIndex ON F.ID = FFTIndex.[Key]
LEFT JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
LEFT JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
LEFT JOIN P.Person p ON pr2.ID = p.PID
LEFT JOIN CONTAINSTABLE(P.Person,FirstName,'%John%') AS PFTIndex ON P.ID = PFTIndex.[Key]
WHERE F.Name IS NOT NULL
This produces the below result.
Query 2
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
INNER JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
INNER JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
INNER JOIN P.Person p ON pr2.ID = p.PID
WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
AND F.Name IS NOT NULL
Result
Expectation
To use query 1 in a way that works as the behavior of an SQL SERVER OR clause. As I can understand Query 1's CONTAINSTABLE, joins the data with the building table, and the rest of the results are going to ignore so that the CONTAINSTABLE of the Person table gets data that already contains the keyword filtered from the building table.
If the keyword = Building, I want to match the keyword in both the tables regardless of searching a saved record in both the tables. Having a record in each table is enough.
Summary
Query 2 performs well but is creates a slowness when the words in the indexes are growing. Query 1 seems optimized(When it comes to multiple online resources and MS Documentation),
however, it does not give me the expected output.
Is there any way to solve this problem?
I am not strictly attached to CONTAINSTABLE. Suggesting another optimization method will also be considerable.
Thank you.

Hard to say definitively without your full data set but a couple of options to explore
Remove Invalid % Wildcards
Why are you using '%SearchTerm%'? Does performance improve if you use the search term without the wildcards (%)? If you want a word that matches a prefix, try something like
WHERE CONTAINS (String,'"SearchTerm*"')
Try Temp Tables
My guess is CONTAINS is slightly faster than CONTAINSTABLE as it doesn't calculate a rank, but I don't know if anyone has ever attempted to benchmark it. Either way, I'd try saving off the matches to a temp table before joining up to the rest of the tables. This will allow the optimizer to create a better execution plan
SELECT ID INTO #Temp
FROM YourTable
WHERE CONTAINS (String,'"SearchTerm"')
SELECT *
FROM #Temp
INNER JOIN...
Optimize Full Text Index by Removing Noisy Words
You might find you have some noisy words aka words that reoccur many times in your data that are meaningless like "the" or perhaps some business jargon. Adding these to your stop list will mean your full text index will ignore them, making your index smaller thus faster
The query below will list indexed words with the most frequent at the top
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.YourTable') /*Replace with your table name*/)
Order By document_count Desc
This OR That Criteria
For your WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%') criteria where you want this or that, is tricky. OR clauses generally perform even when using simple equality operators.
I'd try either doing two queries and union the results like:
SELECT * FROM Table1 F
/*Other joins and stuff*/
WHERE CONTAINS(F.*,'%Gayan%')
UNION
SELECT * FROM Table2 P
/*Other joins and stuff*/
WHERE CONTAINS(P.FirstName,'%John%')
OR this is much more work, but you could load all your data into giant denormalized table with all your columns. Then apply a full text index to that table and adjust your search criteria that way. It'd probably be the fastest method searching, but then you'd have to ensure the data is sync between the denormalized table and the underlying normalized tables
SELECT B.*,P.* INTO DenormalizedTable
FROM Building AS B
INNER JOIN People AS P
CREATE FULL TEXT INDEX ft ON DenormalizedTable
etc...

Find Child with Parent having specific information

I am trying to find children whose parent have some specific information from different relational tables.
I have four tables as shown below
Search Criteria : Get all the "Section" who has parent as "Inventory" level with attached User name containing 'a' letter and role id is 'employee' (Please see LevelsUser table for relation).
I tried CTE (common table expression') approach to find the correct Section level but here I have to pass level Id as hard coded value and I cannot search all Section in the table.
WITH LevelsTree AS
(
SELECT Id, ParentLevelId, Level
FROM Levels
WHERE Level='Section' // here i need to pass value
UNION ALL
SELECT ls.Id, ls.ParentLevelId, ls.Level
FROM Levels ls
JOIN LevelsTree lt ON ls.Id = lt.ParentLevelId
)
SELECT * FROM LevelsTree
I need to find all sections match the above criteria.
Please help me here.

For hierarchical checks you need to select from and then join to the same table Levels. So something like this should help you:
declare #parentLevelName varchar(20) = 'Inventory';
with cte as (
select distinct
l1.id,
l1.Level
from Levels l1
join Levels l2 on l2.id=l1.ParentLevelId
and l2.Level = #parentLevelName -- use variable instead of hardcoded `Inventory`
where l1.Level='Section' -- replace `Section` with #var containing your value
) select * from cte
join LevelUsers lu on lu.LevelId=cte.id
join Users u on u.Id = lu.UserId
and u.UserName like '%a%' -- this letter check is not efficient
join Role r on r.id=lu.RoleId and r.Role='employee'
Note, the above query selects data only from the 4 tables which you have described in DB schema. However, you original query contains a reference to the HierarchyPosition table which you haven't described. If you really need to include HiearchyPosition reference then specify how it relates to the other 4 tables.
Also note, condition and u.UserName like '%a%' used to satisfy your requirement of User name containing 'a' letter is not efficient because of the leading %, which prevents the use of indexes. Consider changing your requirements if possible to User name starts with 'a' letter. This way and u.UserName like 'a%' will allow the use of index over Users table if it exists.
HTH

Full-text Search on documents and related data mssql

Currently in the middle of building a knowledge base app and am a bit unsure on the best way to store and index the document information.
The user uploads the document and when doing so selects a number of options from dropdown lists (such as category,topic,area..., note these are not all mandatory) they also enter some keywords and a description of the document. At the moment the category (and others) selected is stored as foreign key in the documents table using the id from the categories table.
What we want to be able to do is do a FREETEXTTABLE or CONTAINSTABLE on not only the information within the varchar(max) column where the document is located but also on the category name, topic name and area name etc.
I looked at the option of creating an indexed view but this wasn't possible due to the LEFT JOIN against the category column. So I'm not sure how to go about being able to do this any ideas would be most appreciated.

I assume that you want to AND the two searches together. For example find all documents containing the text "foo" AND in category the "Automotive Repair".
Perhaps you don't need to full text the additional data and can just use = or like? If the additional data is reasonably small it may not warrant the complication of full text.
However, if you want to use full text on both, use a stored procedure that pulls the results together for you. The trick here is to stage the results rather than trying to get a result set back straight away.
This is rough starting point.
-- a staging table variable for the document results
declare #documentResults table (
Id int,
Rank int
)
insert into #documentResults
select d.Id, results.[rank]
from containstable (documents, (text), '"foo*"') results
inner join documents d on results.[key] = d.Id
-- now you have all of the primary keys that match the search criteria
-- whittle this list down to only include keys that are in the correct categories
-- a staging table variable for each the metadata results
declare #categories table (
Id int
)
insert into #categories
select results.[KEY]
from containstable (Categories, (Category), '"Automotive Repair*"') results
declare #topics table (
Id int
)
insert into #topics
select results.[KEY]
from containstable (Topics, (Topic), '"Automotive Repair*"') results
declare #areas table (
Id int
)
insert into #areas
select results.[KEY]
from containstable (Areas, (Area), '"Automotive Repair*"') results
select d.text, c.category, t.topic, a.area
from #results r
inner join documents d on d.Id = r.Id
inner join #categories c on c.Id = d.CategoryId
inner join #topics t on t.Id = d.TopicId
inner join #areas a on a.Id = d.AreaId

You could create a new column for your full text index which would contain the original document plus the categories appended as metadata. Then a search on that column could search both the document and the categories simultaneously. You'd need to invent a tagging system that would keep them unique within your document yet the tags would not be likely to be used as search phrases themselves. Perhaps something like:
This is my regular document text. <FTCategory: Automotive Repair> <FTCategory: Transmissions>

SQL Server 2005 Performance: Distinct or full table in WHERE IN statement

We have two Tables:
Document: id, title, document_type_id, showon_id
DocumentType: id, name
Relationship: DocumentType hasMany Documents. (Document.document_type_id = DocumentType.id)
We wish to retrieve a list of all document types for one given ShowOn_Id.
We see two possiblities:
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42
);
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT Document.document_type_id FROM Document WHERE showon_id = 42
);
Our question is: when and if is it better to use the DISTINCT to get the smaller record set versus retrieving the whole table and the IN statement walking the table to the first match. (We guess that's what it does ;-))
Is this different for different databases, is there a common answer?
Or is there a better way of doing it? (We are in .NET land)

You can use a join:
SELECT DISTINCT DocumentType.*
FROM DocumentType
INNER JOIN Document
ON DocumentType.id=Document.document_type_id
WHERE Document.showon_id = 42
I think it's the best way to do it.

For the best performance you should use:
SELECT DISTINCT dt.*
FROM
DocumentType dt
INNER JOIN Document d ON dt.id=d.document_type_id and d.showon_id = 42
Joins are very efficient at bridging multiple tables where as the nested query in the Where clause will need to perform a separate result selection that will filter down the From clause results. The join statement is also much more readable.
I would also put an index on showon_id, in addition to the primary keys and foreign key relationship.
My answer differs from wmasm's answer only by moving the showon_id filter up to the inner join. For MS SQL 2k5, I think the interpreter is smart enough to do this automatically, but you always want to work with the smallest result set possible. Bringing your filters up to inner join statements can limit the number of rows the query has to work with when joining many tables together. If you do this though, you should understand that this happens for every row comparison so complex filters (such as like x = '%a' or function calls) are better left for the Where clause so that the inner joins may filter out unnecessary comparisons.

Use an EXISTS. It sometimes is faster, but in my opinion, more readable than a DISTINCT and JOIN. Just for kicks, pls reply with the query plan for this query and the JOIN above, and see if anything is different (they may be optimized down to the same plan). If they are the same, I'd recommend the EXISTS as it is closer to a "plain language" description than a JOIN (because you don't want any of the data from Document, etc.)
SELECT whatever
FROM DocumentType dt
WHERE EXISTS( SELECT *
FROM Document
WHERE dt.id = document_type_id
AND showon_id = 42)
To get the query plan (ref: http://msdn.microsoft.com/en-us/library/ms180765(SQL.90).aspx), do:
SET SHOWPLAN_TEXT ON
GO
SELECT ...
GO

From my point of view it should not make any difference inside SQL Server (but who knows how this is implemented).
Think of it this way: to return the resultset the server needs to go into the Document table and retrieve all document_type_id WHERE showon_id = 42. In the process of retrieving the document_type_ids (e.g. by index seeking) it puts them into a hash table. When this process has finished the hash table will contain distinct values anyway. After that the query execution goes inside the Document_Type table, scans the primary key and probes into the hash table. Note that this depends, e.g. maybe it's more efficient to not use a hash table, when the expected row count from the Document table it low compared to Document_Type, but in general you get the same query plan as for the query wmasm just suggested.

Follow up on Matt's answer:
I've enabled the query plan and tested the following four different queries that have come up so far:
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DISTINCT DocumentType.* FROM DocumentType INNER JOIN Document ON DocumentType.id=Document.document_type_id WHERE Document.showon_id = 42;
SELECT DocumentType.* FROM DocumentType WHERE EXISTS ( SELECT * FROM Document WHERE DocumentType.id=Document.document_type_id AND showon_id = 42);
The query plan for all four queries turned out to be the same:
|--Hash Match(Right Semi Join, HASH:([Document].[document_type_id])=([DocumentType].[Id]))
|--Hash Match(Inner Join, HASH:([Document].[Title], [Uniq1005])=([Document].[Title], [Uniq1005]), RESIDUAL:([Document].[Title] as [Document].[Title] = [Document].[Title] as [Document].[Title] AND [Uniq1005] = [Uniq1005]))
| |--Index Seek(OBJECT:([Document].[IX_Document_3] AS [Document]), SEEK:([Document].[showon_id]=(1)) ORDERED FORWARD)
| |--Index Scan(OBJECT:([Document].[IX_Document_1] AS [Document]))
|--Table Scan(OBJECT:([DocumentType] AS [DocumentType]))
I am not sure what every line and element means, but it seems that from the performance perspective it does not matter how you construct the query for this kind of problem...

Using Full-Text Search in SQL Server 2008 across multiple tables, columns

I need to search across multiple columns from two tables in my database using Full-Text Search. The two tables in question have the relevant columns full-text indexed.
The reason I'm opting for Full-text search:
1. To be able to search accented words easily (cafè)
2. To be able to rank according to word proximity, etc.
3. "Did you mean XXX?" functionality
Here is a dummy table structure, to illustrate the challenge:
Table Book
BookID
Name (Full-text indexed)
Notes (Full-text indexed)
Table Shelf
ShelfID
BookID
Table ShelfAuthor
AuthorID
ShelfID
Table Author
AuthorID
Name (Full-text indexed)
I need to search across Book Name, Book Notes and Author Name.
I know of two ways to accomplish this:
Using a Full-text Indexed View: This would have been my preferred method, but I can't do this because for a view to be full-text indexed, it needs to be schemabound, not have any outer joins, have a unique index. The view I will need to get my data does not satisfy these constraints (it contains many other joined tables I need to get data from).
Using joins in a stored procedure: The problem with this approach is that I need to have the results sorted by rank. If I am making multiple joins across the tables, SQL Server won't search across multiple fields by default. I can combine two individual CONTAINS queries on the two linked tables, but I don't know of a way to extract the combined rank from the two search queries. For example, if I search for 'Arthur', the results of both the Book query and the Author query should be taken into account and weighted accordingly.

Using FREETEXTTABLE, you just need to design some algorithm to calculate the merged rank on each joined table result. The example below skews the result towards hits from the book table.
SELECT b.Name, a.Name, bkt.[Rank] + akt.[Rank]/2 AS [Rank]
FROM Book b
INNER JOIN Author a ON b.AuthorID = a.AuthorID
INNER JOIN FREETEXTTABLE(Book, Name, #criteria) bkt ON b.ContentID = bkt.[Key]
LEFT JOIN FREETEXTTABLE(Author, Name, #criteria) akt ON a.AuthorID = akt.[Key]
ORDER BY [Rank] DESC
Note that I simplified your schema for this example.

I had the same problem as you but it actually involved 10 tables (a Users table and several others for information)
I did my first query using FREETEXT in the WHERE clause for each table but the query was taking far too long.
I then saw several replies about using FREETEXTTABLE instead and checking for not nulls values in the key column for each table, but that took also to long to execute.
I fixed it by using a combination of FREETEXTTABLE and UNION selects:
SELECT Users.* FROM Users INNER JOIN
(SELECT Users.UserId FROM Users INNER JOIN FREETEXTTABLE(Users, (column1, column2), #variableWithSearchTerm) UsersFT ON Users.UserId = UsersFT.key
UNION
SELECT Table1.UserId FROM Table1 INNER JOIN FREETEXTTABLE(Table1, TextColumn, #variableWithSearchTerm) Table1FT ON Table1.UserId = Table1FT.key
UNION
SELECT Table2.UserId FROM Table2 INNER JOIN FREETEXTTABLE(Table2, TextColumn, #variableWithSearchTerm) Table2FT ON Table2.UserId = Table2FT.key
... --same for all tables
) fts ON Users.UserId = fts.UserId
This proved to be incredibly much faster.
I hope it helps.

I don't think the accepted answer will solve the problem. If you try to find all the books from a certain author and, therefore, use the author's name (or part of it) as the search criteria, the only books returned by the query will be those which have the search criteria in its own name.
The only way I see around this problem is to replicate the Author's columns that you wish to search by in the Book table and index those columns (or column since it would probably be smart to store the author's relevant information in an XML column in the Book table).

FWIW, in a similar situation our DBA created DML triggers to maintain a dedicated full-text search table. It was not possible to use a materialized view because of its many restrictions.

I would use a stored procedure. The full text method or whatever returns a rank which you can sort by. I am not sure how they will be weighted against eachother, but I'm sure you could tinker for awhile and figure it out. For example:
Select SearchResults.key, SearchResults.rank From FREETEXTTABLE(myColumn, *, #searchString) as SearchResults Order By SearchResults.rank Desc

This answer is well overdue, but one way to do this if you cannot modify primary tables is to create a new table with the search parameters added to one column.
Then create a full text index on that column and query that column.
Example
SELECT
FT_TBL.[EANHotelID] AS HotelID,
ISNULL(FT_TBL.[Name],'-') AS HotelName,
ISNULL(FT_TBL.[Address1],'-') AS HotelAddress,
ISNULL(FT_TBL.[City],'-') AS HotelCity,
ISNULL(FT_TBL.[StateProvince],'-') AS HotelCountyState,
ISNULL(FT_TBL.[PostalCode],'-') AS HotelPostZipCode,
ISNULL(FT_TBL.[Latitude],0.00) AS HotelLatitude,
ISNULL(FT_TBL.[Longitude],0.00) AS HotelLongitude,
ISNULL(FT_TBL.[CheckInTime],'-') AS HotelCheckinTime,
ISNULL(FT_TBL.[CheckOutTime],'-') AS HotelCheckOutTime,
ISNULL(b.[CountryName],'-') AS HotelCountry,
ISNULL(c.PropertyDescription,'-') AS HotelDescription,
KEY_TBL.RANK
FROM [EAN].[dbo].[tblactivepropertylist] AS FT_TBL INNER JOIN
CONTAINSTABLE ([EAN].[dbo].[tblEanFullTextSearch], FullTextSearchColumn, #s)
AS KEY_TBL
ON FT_TBL.EANHotelID = KEY_TBL.[KEY]
INNER JOIN [EAN].[dbo].[tblCountrylist] b
ON FT_TBL.Country = b.CountryCode
INNER JOIN [EAN].[dbo].[tblPropertyDescriptionList] c
ON FT_TBL.[EANHotelID] = c.EANHotelID
In the code above [EAN].[dbo].[tblEanFullTextSearch], FullTextSearchColumn is the new table and column with the fields added, you can now do a query on the new table with joins to the table you want to display the data from.
Hope this helps

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight