How to use FULL-TEXT SEARCH in H2 Database? - database

Consider the following example
CREATE ALIAS IF NOT EXISTS FT_INIT FOR "org.h2.fulltext.FullText.init";
CALL FT_INIT();
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR);
INSERT INTO TEST VALUES(1, 'Hello World');
CALL FT_CREATE_INDEX('PUBLIC', 'TEST', NULL);
and i have executed the following query
SELECT * FROM FT_SEARCH('Hello', 0, 0);
But this query is returning "PUBLIC"."TEST" WHERE "ID"=1 .
Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search

Yes, each row in a query using FT_SEARCH represents a schema-table-row where one of the key words was found. The search is case insensitive, and the text parameter to FT_SEARCH may include more than one word. For example,
DELETE FROM TEST;
INSERT INTO TEST VALUES(1, 'Hello World');
INSERT INTO TEST VALUES(2, 'Goodbye World');
INSERT INTO TEST VALUES(3, 'Hello Goodbye');
CALL FT_REINDEX();
SELECT * FROM FT_SEARCH('hello goodbye', 0, 0);
returns only row three:
QUERY SCORE
"PUBLIC"."TEST" WHERE "ID"=3 1.0
Also note that FT_SEARCH_DATA may be used to retrieve the data itself. For example,
SELECT T.* FROM FT_SEARCH_DATA('hello', 0, 0) FT, TEST T
WHERE FT.TABLE='TEST' AND T.ID=FT.KEYS[0];
returns both rows containing the keyword:
ID NAME
1 Hello World
3 Hello Goodbye
Apache Lucene supports wildcard searches, although leading wildcards (e.g. *ell) tend to be expensive.

Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
Yes, except if you use a join as described by trashgod. The reason is: usually rows are much larger than just two words. For example, a row contains a CLOB with a document. If the result of the fulltext search would contain the data, then fulltext search would be much slower.
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search
The native fulltext search can't do that directly. The reason is: fulltext search only indexes whole words. (By the way: does Google support searches if you only know a part of a word? Apache Lucene does support it) Actually, for H2, there would be a way: first, search the words table (FT.WORDS) for matches, and then use a regular search.

Related

SQL Server oddity with full-text indexing and case sensitivity and numbers

This problem may be unique to our server, but I can't tell from the symptoms where the issue may lie.
I have a field (searchtitle) in a table (fsItems) that has a full-text index turned ON. For the record with primary key (IDItem) 704629 the content of this field is "TEST AFA 0 TEST".
Surprisingly, the following query returns no results:
SELECT * FROM fsItems WHERE CONTAINS(searchtitle,'AFA') AND IDItem = 704629
However, if I change the content to be "TEST afa 0 TEST" or "TEST AFA O TEST" (capital "O" instead of zero) the query returns the record. (It also returns the record if I change the content to "TEST AFB 0 TEST" and the CONTAINS argument to 'AFB'.)
At first I thought maybe AFA was some kind of stop word, but that wouldn't explain why changing zero to upper-case "O" returns the proper result.
Any idea what is going on here?
Thanks for any suggestions
Very interesting little quirk. It appears SQL Server is considering "AFA 0" as a single "word". My guess is this is an issue with the word breakers configured for standard English. It appears you can manually adjust them, but it doesn't look simple or intuitive. See Microsoft's how to documentation here
Identifying Words in Full-text Index
The below script lists every word in a full text index. If you run this against your table, you'll see in column display_term word "AFA 0". Side note: this script is also very useful in optimizing full text indexes by identifying "noisy" words to add to your stop list
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.tbl_fulltext_test') /*Replace with your table name*/)
Order By document_count Desc
Full SQL Used to Identify the Issue
CREATE TABLE tbl_fulltext_test
(ID int constraint PK_fulltext_test primary key identity (1,1)
,String Varchar(1000)
)
Create Fulltext Catalog ct_test
With Accent_Sensitivity = Off
Create Fulltext Stoplist sl_test
From System Stoplist;
Create Fulltext Index On tbl_fulltext_test(String)
Key Index PK_fulltext_test On ct_test
With Stoplist = sl_test,Change_Tracking Auto;
INSERT INTO tbl_fulltext_test
VALUES
('TEST AFA 0 TEST') /*Zero*/
,('TEST afa 0 TEST') /*Zero*/
,('TEST AFB 0 TEST') /*AFB*/
,('TEST AFA O TEST') /*Letter O*/
/*Returns rows 2 and 4*/
SELECT *
FROM tbl_fulltext_test
WHERE CONTAINS (String,'AFA')
/*Returns row 1*/
SELECT *
FROM tbl_fulltext_test
WHERE CONTAINS (String,'"AFA 0"')

SQL Server FTS is returning inaccurate results

We have a column in our table that is indexed for full text search. In it we store values such as
<zNSIC>1010</zNSIC>
The value within the tags could be anything so then we create a search query similar to...
SELECT KEY
FROM CONTAINSTABLE(SearchTable, SearchText, '("<zNSIC>15*")')
and it should return any record where the SearchText column has the zNSIC tag with a value like 1500, 1501, 1502, etc. This is working however I'm also getting back a couple of records where there is no zNSIC tag starting with 15. The closest match I can find in the two records are
<zNSIC>DM15</zNSIC>
I can't figure out why it's considering the DM in that value as a match. Any ideas? This is SQL Server 2014.
The "15" is parsed out as a separate phrase as can be seen here:
select keyword, special_term, display_term, source_term
from sys.dm_fts_parser('("<zNSIC>15*")', 1033, 0, 0);
keyword special_term display_term source_term
0x007A006E007300690063 Exact Match znsic <zNSIC>15
0x00310035 Exact Match 15 <zNSIC>15
0x006E006E00310035 Exact Match nn15 <zNSIC>15

Full text search syntax error

I created a full text search with a catalog and index and the contains query works fine when I run a query with one word like below.
SELECT Name
FROM dbo.Gifts
WHERE CONTAINS(Name, 'gift')
it returns 'test gift'
I have only one row in the table and the data in the Name column looks like this: 'test gift'
but when I run the conaints query with this statement:
SELECT Name
FROM dbo.Gifts
WHERE CONTAINS(Name, 'test gift')
It throws an error saying: Syntax error near 'gift' in the full-text search condition 'test gift'.
I thought contains could query phrases and multiple words that match and sound alike?
You need double quotes to manage that space, keeping in mind that you are searching for the entire string, and not the words of the string. The following query would find "test gift" but not "gift test"
SELECT Name
FROM dbo.Gifts
WHERE CONTAINS(Name, '"test gift"')
or, if you want to search words individually, it would be
SELECT Name
FROM dbo.Gifts
WHERE CONTAINS(Name, '"test" AND "gift"')
this second one should get you a field with "gift test" as well as "test gift"

NVarchar Prefix causes wrong index to be selected

I have an entity framework query that has this at the heart of it:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
The execution plan for this chooses an index that is a composite of three columns. This takes 10 to 12 seconds.
However, there is an index that is just [SomeOtherOrderId] with a few other columns in the "include". That is the index that should be used. And when I run the following queries it is used:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = CAST(N'SOME VALUE HERE' AS VARCHAR(200))
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = 'SOME VALUE HERE'
This returns instantly. And it uses the index that is just SomeOtherOrderId
So, my problem is that I can't really change how Entity Framework makes the query.
Is there something I can do from an indexing point of view that could cause the correct index to be selected?
As far as I know, since version 4.0, EF doesn't generate unicode parameters for non-unicode columns. But you can always force non-unicode parameters by DbFunctions.AsNonUnicode (prior to EF6, DbFunctions is EntityFunctions):
from o in db.WidgetOrder
where o.SomeOtherOrderId == DbFunctions.AsNonUnicode(param)
select o
Try something like ....
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets WITH (INDEX(Target_Index_Name))
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
This query hint sql server explicitly what index to use to get resutls.

SQL XML query assistance

I've got a table in a SQL Server 2008 database with an nvarchar(MAX) column containing XML data. The data represents search criteria. Here's what the XML looks like for search criteria with one top-level "OR" group containing one single criterion and a nested two-criterion "AND" group.
<?xml version="1.0" encoding="utf-16"?>
<SearchCriterionGroupArgs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SingleCriteria>
<SearchCriterionSingleArgs>
<Operator>Equals</Operator>
<Value>test</Value>
<FieldIDs>
<int>1026</int>
<int>478</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
</SingleCriteria>
<GroupCriteria>
<SearchCriterionGroupArgs>
<SingleCriteria>
<SearchCriterionSingleArgs>
<Operator>GreaterThan</Operator>
<Value>2010-01-23</Value>
<FieldIDs>
<int>1017</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
<SearchCriterionSingleArgs>
<Operator>LessThan</Operator>
<Value>2013-01-23</Value>
<FieldIDs>
<int>1018</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
</SingleCriteria>
<GroupCriteria />
<EntityID>92</EntityID>
<LogicalOperator>AND</LogicalOperator>
</SearchCriterionGroupArgs>
</GroupCriteria>
<EntityID>92</EntityID>
<LogicalOperator>OR</LogicalOperator>
</SearchCriterionGroupArgs>
Given a an input set of FieldID values, I need to search the table to find if there are any records whose search criteria refer to one of those values (these are represented in the "int" nodes under the "FieldIDs" nodes.)
By running this query:
select CAST(OptionalConditions as xml).query('//FieldIDs')
from tblMyTable
I get the results:
<FieldIDs>
<int>1026</int>
<int>478</int>
</FieldIDs>
<FieldIDs>
<int>1017</int>
</FieldIDs>
<FieldIDs>
<int>1018</int>
</FieldIDs>
(currently there's only one record in the table with xml data in it.)
But I'm just getting started with this stuff and I don't know what the notation would be to check those lists for the existence of any of an arbitrary set of FieldIDs. I don't need to retrieve any particular nodes, just true or false for whether the input field IDs are referenced anywhere in the search.
Thanks for your help!
Edit: using Ranon's solution, I got it working using a query like this:
SELECT *
FROM myTable
WHERE CAST(OptionalConditions as xml).exist('//FieldIDs/int[.=(1019,111,1018)]') = 1
Fetch all FieldIDs and compare them with the set id IDs to check against. XQuery's =-operator compares in a set-based semantics, so if one of the IDs on the left side equal on one the right, this expression will evaluate to true.
//FieldIDs/int = (42, 478)
As "478" is a FieldID, this query will return true. "42" is one not available.
I'm not sure about whether you will be able to cast the result to some sql-server-boolean-type as I haven't got one running, but you will easily be able to try out yourself.
If you're also interested in the nodes contained, you could use this query:
//FieldIDs/int[. = (42,478)]

Resources