Using Full-Text Search in SQL Server 2008 across multiple tables, columns - sql-server

I need to search across multiple columns from two tables in my database using Full-Text Search. The two tables in question have the relevant columns full-text indexed.
The reason I'm opting for Full-text search:
1. To be able to search accented words easily (cafè)
2. To be able to rank according to word proximity, etc.
3. "Did you mean XXX?" functionality
Here is a dummy table structure, to illustrate the challenge:
Table Book
BookID
Name (Full-text indexed)
Notes (Full-text indexed)
Table Shelf
ShelfID
BookID
Table ShelfAuthor
AuthorID
ShelfID
Table Author
AuthorID
Name (Full-text indexed)
I need to search across Book Name, Book Notes and Author Name.
I know of two ways to accomplish this:
Using a Full-text Indexed View: This would have been my preferred method, but I can't do this because for a view to be full-text indexed, it needs to be schemabound, not have any outer joins, have a unique index. The view I will need to get my data does not satisfy these constraints (it contains many other joined tables I need to get data from).
Using joins in a stored procedure: The problem with this approach is that I need to have the results sorted by rank. If I am making multiple joins across the tables, SQL Server won't search across multiple fields by default. I can combine two individual CONTAINS queries on the two linked tables, but I don't know of a way to extract the combined rank from the two search queries. For example, if I search for 'Arthur', the results of both the Book query and the Author query should be taken into account and weighted accordingly.

Using FREETEXTTABLE, you just need to design some algorithm to calculate the merged rank on each joined table result. The example below skews the result towards hits from the book table.
SELECT b.Name, a.Name, bkt.[Rank] + akt.[Rank]/2 AS [Rank]
FROM Book b
INNER JOIN Author a ON b.AuthorID = a.AuthorID
INNER JOIN FREETEXTTABLE(Book, Name, #criteria) bkt ON b.ContentID = bkt.[Key]
LEFT JOIN FREETEXTTABLE(Author, Name, #criteria) akt ON a.AuthorID = akt.[Key]
ORDER BY [Rank] DESC
Note that I simplified your schema for this example.

I had the same problem as you but it actually involved 10 tables (a Users table and several others for information)
I did my first query using FREETEXT in the WHERE clause for each table but the query was taking far too long.
I then saw several replies about using FREETEXTTABLE instead and checking for not nulls values in the key column for each table, but that took also to long to execute.
I fixed it by using a combination of FREETEXTTABLE and UNION selects:
SELECT Users.* FROM Users INNER JOIN
(SELECT Users.UserId FROM Users INNER JOIN FREETEXTTABLE(Users, (column1, column2), #variableWithSearchTerm) UsersFT ON Users.UserId = UsersFT.key
UNION
SELECT Table1.UserId FROM Table1 INNER JOIN FREETEXTTABLE(Table1, TextColumn, #variableWithSearchTerm) Table1FT ON Table1.UserId = Table1FT.key
UNION
SELECT Table2.UserId FROM Table2 INNER JOIN FREETEXTTABLE(Table2, TextColumn, #variableWithSearchTerm) Table2FT ON Table2.UserId = Table2FT.key
... --same for all tables
) fts ON Users.UserId = fts.UserId
This proved to be incredibly much faster.
I hope it helps.

I don't think the accepted answer will solve the problem. If you try to find all the books from a certain author and, therefore, use the author's name (or part of it) as the search criteria, the only books returned by the query will be those which have the search criteria in its own name.
The only way I see around this problem is to replicate the Author's columns that you wish to search by in the Book table and index those columns (or column since it would probably be smart to store the author's relevant information in an XML column in the Book table).

FWIW, in a similar situation our DBA created DML triggers to maintain a dedicated full-text search table. It was not possible to use a materialized view because of its many restrictions.

I would use a stored procedure. The full text method or whatever returns a rank which you can sort by. I am not sure how they will be weighted against eachother, but I'm sure you could tinker for awhile and figure it out. For example:
Select SearchResults.key, SearchResults.rank From FREETEXTTABLE(myColumn, *, #searchString) as SearchResults Order By SearchResults.rank Desc

This answer is well overdue, but one way to do this if you cannot modify primary tables is to create a new table with the search parameters added to one column.
Then create a full text index on that column and query that column.
Example
SELECT
FT_TBL.[EANHotelID] AS HotelID,
ISNULL(FT_TBL.[Name],'-') AS HotelName,
ISNULL(FT_TBL.[Address1],'-') AS HotelAddress,
ISNULL(FT_TBL.[City],'-') AS HotelCity,
ISNULL(FT_TBL.[StateProvince],'-') AS HotelCountyState,
ISNULL(FT_TBL.[PostalCode],'-') AS HotelPostZipCode,
ISNULL(FT_TBL.[Latitude],0.00) AS HotelLatitude,
ISNULL(FT_TBL.[Longitude],0.00) AS HotelLongitude,
ISNULL(FT_TBL.[CheckInTime],'-') AS HotelCheckinTime,
ISNULL(FT_TBL.[CheckOutTime],'-') AS HotelCheckOutTime,
ISNULL(b.[CountryName],'-') AS HotelCountry,
ISNULL(c.PropertyDescription,'-') AS HotelDescription,
KEY_TBL.RANK
FROM [EAN].[dbo].[tblactivepropertylist] AS FT_TBL INNER JOIN
CONTAINSTABLE ([EAN].[dbo].[tblEanFullTextSearch], FullTextSearchColumn, #s)
AS KEY_TBL
ON FT_TBL.EANHotelID = KEY_TBL.[KEY]
INNER JOIN [EAN].[dbo].[tblCountrylist] b
ON FT_TBL.Country = b.CountryCode
INNER JOIN [EAN].[dbo].[tblPropertyDescriptionList] c
ON FT_TBL.[EANHotelID] = c.EANHotelID
In the code above [EAN].[dbo].[tblEanFullTextSearch], FullTextSearchColumn is the new table and column with the fields added, you can now do a query on the new table with joins to the table you want to display the data from.
Hope this helps

Related

Is there an equivalent to OR clause in CONTAINSTABLE - FULL TEXT INDEX

I am trying to find a solution in order to improve the String searching process and I selected FULL-TEXT INDEX Strategy.
However, after implementing it, I still can see there is a performance hit when it comes to search by using multiple strings using multiple Full-Text Index tables with OR clauses.
(E.x. WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%'))
As a solution, I am trying to use CONTAINSTABLE expecting a performance improvement.
Now, I am facing an issue with CONTAINSTABLE when it comes to joining tables with a LEFT JOIN
Please go through the example below.
Query 1
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
LEFT JOIN CONTAINSTABLE(P.Building,*,'%John%') AS FFTIndex ON F.ID = FFTIndex.[Key]
LEFT JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
LEFT JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
LEFT JOIN P.Person p ON pr2.ID = p.PID
LEFT JOIN CONTAINSTABLE(P.Person,FirstName,'%John%') AS PFTIndex ON P.ID = PFTIndex.[Key]
WHERE F.Name IS NOT NULL
This produces the below result.
Query 2
SELECT F.Name,p.*
FROM P.Role PR
INNER JOIN P.Building F ON PR.PID = F.PID
INNER JOIN P.Relationship PRSHIP ON PR.id = prship.ToRoleID
INNER JOIN P.Role PR2 ON PRSHIP.ToRoleID = PR2.ID
INNER JOIN P.Person p ON pr2.ID = p.PID
WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%')
AND F.Name IS NOT NULL
Result
Expectation
To use query 1 in a way that works as the behavior of an SQL SERVER OR clause. As I can understand Query 1's CONTAINSTABLE, joins the data with the building table, and the rest of the results are going to ignore so that the CONTAINSTABLE of the Person table gets data that already contains the keyword filtered from the building table.
If the keyword = Building, I want to match the keyword in both the tables regardless of searching a saved record in both the tables. Having a record in each table is enough.
Summary
Query 2 performs well but is creates a slowness when the words in the indexes are growing. Query 1 seems optimized(When it comes to multiple online resources and MS Documentation),
however, it does not give me the expected output.
Is there any way to solve this problem?
I am not strictly attached to CONTAINSTABLE. Suggesting another optimization method will also be considerable.
Thank you.
Hard to say definitively without your full data set but a couple of options to explore
Remove Invalid % Wildcards
Why are you using '%SearchTerm%'? Does performance improve if you use the search term without the wildcards (%)? If you want a word that matches a prefix, try something like
WHERE CONTAINS (String,'"SearchTerm*"')
Try Temp Tables
My guess is CONTAINS is slightly faster than CONTAINSTABLE as it doesn't calculate a rank, but I don't know if anyone has ever attempted to benchmark it. Either way, I'd try saving off the matches to a temp table before joining up to the rest of the tables. This will allow the optimizer to create a better execution plan
SELECT ID INTO #Temp
FROM YourTable
WHERE CONTAINS (String,'"SearchTerm"')
SELECT *
FROM #Temp
INNER JOIN...
Optimize Full Text Index by Removing Noisy Words
You might find you have some noisy words aka words that reoccur many times in your data that are meaningless like "the" or perhaps some business jargon. Adding these to your stop list will mean your full text index will ignore them, making your index smaller thus faster
The query below will list indexed words with the most frequent at the top
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.YourTable') /*Replace with your table name*/)
Order By document_count Desc
This OR That Criteria
For your WHERE CONTAINS(F.*,'%Gayan%') OR CONTAINS(P.FirstName,'%John%') criteria where you want this or that, is tricky. OR clauses generally perform even when using simple equality operators.
I'd try either doing two queries and union the results like:
SELECT * FROM Table1 F
/*Other joins and stuff*/
WHERE CONTAINS(F.*,'%Gayan%')
UNION
SELECT * FROM Table2 P
/*Other joins and stuff*/
WHERE CONTAINS(P.FirstName,'%John%')
OR this is much more work, but you could load all your data into giant denormalized table with all your columns. Then apply a full text index to that table and adjust your search criteria that way. It'd probably be the fastest method searching, but then you'd have to ensure the data is sync between the denormalized table and the underlying normalized tables
SELECT B.*,P.* INTO DenormalizedTable
FROM Building AS B
INNER JOIN People AS P
CREATE FULL TEXT INDEX ft ON DenormalizedTable
etc...

Access 2016 two table join based on a field found in another table

In advance, thank you for your assistance and explanation, I am a novice at best and trying to teach myself.
I have two different types of claims tables, [claim_a] and [claim_b]. Each of these tables capture similar data and have one claim record per row. I have a similar form to display data by record for each table. One of my forms has a grid that captures documents sent and the date sent.
In a third table, [documents], each instance of a document sent is associated with an ssn, claim_num, first, last and clm_type, doc_type and date_sent.
I want to create one query that would output all correspondence sent for both claim tables. I realize I could just do two individual queries but I think this can be done and is not too difficult, I am just missing something and would like to know what. I have tried various join type (inner, left, right) and get various results but nothing that is actually correct. With INNER JOIN, I only got 78 records but am expecting 2,261 and when I did LEFT OUTER, I got 3,070 which totals more than what I had in my [documents] table-I do understand that an outer join with one row in the LEFT table that matches two rows in the RIGHT table will return as two ROWS.
I have also been sure to use parenthesis in my first join statement which based on Google searches seems to be related to Access. I also tried using where clauses too.
I think the problem may be that some of the records in [documents] do not correspond to a record in either claims table. I also just tried joining one claim table to [documents] but even that did not return the expected number of results.
Here are few of the joins I have tried:
Inner Join for one table: My output was missing 4 records for an SSN with 6 total records and I could not figure out why it skipped over the remaining 4. It was only for this SSN. I had other SSNs with more than 6 records.
SELECT documents.date, documents.doc_type,
FROM documents INNER JOIN claim_a ON documents.ssn =
claim_a.ssn WHERE (((documents.clm_type)="Life Only")) OR
(((documents.clm_type)<>("Health")) AND (("Life/ ADB")<>False) AND
(("Life")<>False));
I got 78 records with this join
SELECT documents.date, documents.doc_type,
FROM (documents INNER JOIN claim_a ON documents.ssn =
claim_a.ssn) INNER JOIN claim_b ON documents.ssn =
claim_b.ssn;
I got 3070 records with this join
SELECT documents.date, documents.doc_type,
FROM (documents LEFT OUTER JOIN claim_a ON documents.ssn =
claim_a.ssn) LEFT OUTER JOIN claim_b ON documents.ssn =
claim_b.ssn;
I got the correct number of results with this query but I am concerned it will not work with my Master Form to display header specific information for my form associated with table, claim_b.
SELECT documents.date, documents.doc_type,
FROM documents LEFT JOIN claim_a ON documents.ssn =
claim_a.ssn WHERE (((documents.clm_type)<>""));
I am obviously doing something wrong. Can someone please advise?
Sounds like you need a Union query between the two claims tables to get a list of all claims. Then use the results of theat query to get the document list.
Union query
Select ssn from claim_a
Union all select ssn from claim_b
Save this with a name like SSN_List, then join it to Documents in another query
Select * from Documents
left join SSN_List on Documents.ssn=SSN_List.ssn
And of course change the 2nd query as needed to get the information you need from Documents.
This can probably be done in one query, but I find it easier to understand and use the 2 step approach.

Find matching records of two different tables in SQL Server

I have two tables in one of them a seller saves a record for a product he is selling. and in another table buyers save what they need to buy.
I need to get a list of user ids (uid field) from buyers table which matches a specific product on sales table. this is what I have written:
select n.[uid]
from needs n
left join ads(getdate()) a
on n.mid=a.mid
and a.[year] between n.from_year and n.to_year
and a.price between n.from_price and n.to_price
and n.[uid]=a.[uid]
and a.pid=n.pid
Well I need to use a where clause to eliminate those records which doesn't match. as I think all of these conditions are defined with ON must be defined with a where clause. but joining needs at least one ON clause. may be I shouldn't join two tables? what can I do?
There is an important difference between LEFT JOIN and JOIN, or more accurately OUTER and INNER joins respectively.
Inner joins require that both sides of the join match. In other words, if you:
had a table representing People
you had another table representing Automobiles
and each automobile had a PersonId
and joined these tables using ON with the PersonId
using LEFT (OUTER) JOIN would return all people, even those without automobiles. INNER JOIN only returns the people with vehicles.
This article may help: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

Search in multiple tables with Full-Text

I'm trying to make a detailed search with asp and SQL Server Full-text.
When a keyword submitted, I need to search in multiple tables. For example,
Table - Members
member_id
contact_name
Table - Education
member_id
school_name
My query;
select mem.member_id, mem.contact_name, edu.member_id, edu.school_name from Members mem FULL OUTER JOIN Education edu on edu.member_id=mem.member_id where CONTAINS (mem.contact_name, '""*"&keyword&"*""') or CONTAINS (edu.school_name, '""*"&keyword&"*""') order by mem.member_id desc;
This query works but it takes really long time to execute.
Image that the keyword is Phill; If mem.contact_name matches then list it, or if edu.school_name matches, list the ones whose education match the keyword.
I hope I could explain well :) Sorry for my english though.
Perhaps try an indexed view containing the merged dataset- you can add the fulltext index there instead of the individual tables, and it's further extensible to as many tables as you need down the line. Only trick, of course, is the space...
This is what I would do for my multi table full text search.
Not exact but it will give basic idea. the key thing is to give table vise contain with OR condition.
DECLARE #SearchTerm NVARCHAR(250)
SET #SearchTerm = '"Texas*"'
SELECT * FROM table1
JOIN table2 on table1.Id = table2.FKID
WHERE (
(#SearchTerm = '""') OR
CONTAINS((table1.column1, table1.column2, table1.column3), #SearchTerm) OR
CONTAINS((table2.column1, table2.column2), #SearchTerm)
)
Couple of points I don't understand that will be affecting your speed.
Do you really need a full outer join? That's killing you. It looks like these tables are one to one. In that case can't you make it an inner join?
Can't you pass a column list to contains like so:
SELECT mem.member_id,
mem.contact_name,
edu.member_id,
edu.school_name
FROM members mem
INNER JOIN education edu ON edu.member_id = mem.member_id
WHERE Contains((mem.contact_name,edu.school_name),'*keyword*')
ORDER BY mem.member_id DESC
Further info about contains.

SQL Server 2005 Performance: Distinct or full table in WHERE IN statement

We have two Tables:
Document: id, title, document_type_id, showon_id
DocumentType: id, name
Relationship: DocumentType hasMany Documents. (Document.document_type_id = DocumentType.id)
We wish to retrieve a list of all document types for one given ShowOn_Id.
We see two possiblities:
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42
);
SELECT DocumentType.*
FROM DocumentType
WHERE DocumentType.id IN (
SELECT Document.document_type_id FROM Document WHERE showon_id = 42
);
Our question is: when and if is it better to use the DISTINCT to get the smaller record set versus retrieving the whole table and the IN statement walking the table to the first match. (We guess that's what it does ;-))
Is this different for different databases, is there a common answer?
Or is there a better way of doing it? (We are in .NET land)
You can use a join:
SELECT DISTINCT DocumentType.*
FROM DocumentType
INNER JOIN Document
ON DocumentType.id=Document.document_type_id
WHERE Document.showon_id = 42
I think it's the best way to do it.
For the best performance you should use:
SELECT DISTINCT dt.*
FROM
DocumentType dt
INNER JOIN Document d ON dt.id=d.document_type_id and d.showon_id = 42
Joins are very efficient at bridging multiple tables where as the nested query in the Where clause will need to perform a separate result selection that will filter down the From clause results. The join statement is also much more readable.
I would also put an index on showon_id, in addition to the primary keys and foreign key relationship.
My answer differs from wmasm's answer only by moving the showon_id filter up to the inner join. For MS SQL 2k5, I think the interpreter is smart enough to do this automatically, but you always want to work with the smallest result set possible. Bringing your filters up to inner join statements can limit the number of rows the query has to work with when joining many tables together. If you do this though, you should understand that this happens for every row comparison so complex filters (such as like x = '%a' or function calls) are better left for the Where clause so that the inner joins may filter out unnecessary comparisons.
Use an EXISTS. It sometimes is faster, but in my opinion, more readable than a DISTINCT and JOIN. Just for kicks, pls reply with the query plan for this query and the JOIN above, and see if anything is different (they may be optimized down to the same plan). If they are the same, I'd recommend the EXISTS as it is closer to a "plain language" description than a JOIN (because you don't want any of the data from Document, etc.)
SELECT whatever
FROM DocumentType dt
WHERE EXISTS( SELECT *
FROM Document
WHERE dt.id = document_type_id
AND showon_id = 42)
To get the query plan (ref: http://msdn.microsoft.com/en-us/library/ms180765(SQL.90).aspx), do:
SET SHOWPLAN_TEXT ON
GO
SELECT ...
GO
From my point of view it should not make any difference inside SQL Server (but who knows how this is implemented).
Think of it this way: to return the resultset the server needs to go into the Document table and retrieve all document_type_id WHERE showon_id = 42. In the process of retrieving the document_type_ids (e.g. by index seeking) it puts them into a hash table. When this process has finished the hash table will contain distinct values anyway. After that the query execution goes inside the Document_Type table, scans the primary key and probes into the hash table. Note that this depends, e.g. maybe it's more efficient to not use a hash table, when the expected row count from the Document table it low compared to Document_Type, but in general you get the same query plan as for the query wmasm just suggested.
Follow up on Matt's answer:
I've enabled the query plan and tested the following four different queries that have come up so far:
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT DISTINCT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DocumentType.* FROM DocumentType WHERE DocumentType.id IN (SELECT Document.document_type_id FROM Document WHERE showon_id = 42);
SELECT DISTINCT DocumentType.* FROM DocumentType INNER JOIN Document ON DocumentType.id=Document.document_type_id WHERE Document.showon_id = 42;
SELECT DocumentType.* FROM DocumentType WHERE EXISTS ( SELECT * FROM Document WHERE DocumentType.id=Document.document_type_id AND showon_id = 42);
The query plan for all four queries turned out to be the same:
|--Hash Match(Right Semi Join, HASH:([Document].[document_type_id])=([DocumentType].[Id]))
|--Hash Match(Inner Join, HASH:([Document].[Title], [Uniq1005])=([Document].[Title], [Uniq1005]), RESIDUAL:([Document].[Title] as [Document].[Title] = [Document].[Title] as [Document].[Title] AND [Uniq1005] = [Uniq1005]))
| |--Index Seek(OBJECT:([Document].[IX_Document_3] AS [Document]), SEEK:([Document].[showon_id]=(1)) ORDERED FORWARD)
| |--Index Scan(OBJECT:([Document].[IX_Document_1] AS [Document]))
|--Table Scan(OBJECT:([DocumentType] AS [DocumentType]))
I am not sure what every line and element means, but it seems that from the performance perspective it does not matter how you construct the query for this kind of problem...

Resources