SQL Server FTS is returning inaccurate results - sql-server

We have a column in our table that is indexed for full text search. In it we store values such as
<zNSIC>1010</zNSIC>
The value within the tags could be anything so then we create a search query similar to...
SELECT KEY
FROM CONTAINSTABLE(SearchTable, SearchText, '("<zNSIC>15*")')
and it should return any record where the SearchText column has the zNSIC tag with a value like 1500, 1501, 1502, etc. This is working however I'm also getting back a couple of records where there is no zNSIC tag starting with 15. The closest match I can find in the two records are
<zNSIC>DM15</zNSIC>
I can't figure out why it's considering the DM in that value as a match. Any ideas? This is SQL Server 2014.

The "15" is parsed out as a separate phrase as can be seen here:
select keyword, special_term, display_term, source_term
from sys.dm_fts_parser('("<zNSIC>15*")', 1033, 0, 0);
keyword special_term display_term source_term
0x007A006E007300690063 Exact Match znsic <zNSIC>15
0x00310035 Exact Match 15 <zNSIC>15
0x006E006E00310035 Exact Match nn15 <zNSIC>15

Related

WHERE clause in MDX for Dynamic Management Views

I am trying to query the metadata from a tabular cube using DMVs. I was able to get it to work without a where clause, but I can't seem to get a where clause to work. Any Advice?
Here is the code that works:
SELECT
[MEASURE_CAPTION] AS [Measure]
,[MEASURE_IS_VISIBLE] AS [Visable]
,[DESCRIPTION] AS [Description]
,[MEASURE_DISPLAY_FOLDER] AS [Display Folder]
,[EXPRESSION] AS [Calculation]
FROM $SYSTEM.MDSCHEMA_MEASURES
The WHERE clauses that I have tried are:
WHERE ([MEASURE_IS_VISIBLE].[members].[true])
I get the following error:
The dot expression is not allowed in the context at line 9, column 1.
Also:
WHERE [MEASURE_IS_VISIBLE] = TRUE
I get the following error:
Error: A Boolean expression is not allowed in the context at line 9, column 7.
I have tried many veriations on these themes, but always get the same result. I am not at all familiar with how MDX works, so any assistance would be appreciated.
MEASURE_IS_VISIBLE is a boolean column and must be filtered as such. The following query filters the results to only those with a MEASURE_IS_VISIBLE value of true. Change this to WHERE NOT MEASURE_IS_VISIBLE for rows where this is false. You can see the datatypes for this DMV from the documentation here.
SELECT
[MEASURE_CAPTION] AS [Measure]
,[MEASURE_IS_VISIBLE] AS [Visable]
,[DESCRIPTION] AS [Description]
,[MEASURE_DISPLAY_FOLDER] AS [Display Folder]
,[EXPRESSION] AS [Calculation]
FROM $SYSTEM.MDSCHEMA_MEASURES
WHERE MEASURE_IS_VISIBLE

SQL Server 2014 - XQuery - get comma-separated List

I have a database table in SQL Server 2014 with only an ID column (int) and a column xmldata of type XML.
This xmldata column contains for example:
<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>
As expected, I have multiple books, therefore multiple rows with xmldata.
I now want to execute a query for all books, where Peter is an Author. I tried this in some xPath2.0 testers and got to the conclusion that:
/book/author/concat(text(), if(position() != last())then ',' else '')
works.
If you try to port this success into SQL Server 2014 Express it looks like this, which is correctly escaped syntax etc.:
SELECT id
FROM books
WHERE 'Peter' IN (xmldata.query('/book/author/concat(text(), if(position() != last())then '','' else '''')'))
SQL Server however does not seem to support a construction like /concat(...) because of:
The XQuery syntax '/function()' is not supported.
I am at a loss then however, why /text() would work in:
SELECT id, xmldata.query('/book/author/text()')
FROM books
which it does.
My constraints:
I am bound to use SQL Server
I am bound to xpath or something else that can be "injected" as the statement above (if the structure of the xml or the database changes, the xpath above could be changed isolated and the application logic above that constructs the Where clause will not be touched) SEE EDIT
Is there a way to make this work?
regards,
BillDoor
EDIT:
My second constraint boils down to this:
An Application constructs the Where clause by
expression <operator> value(s)
expression is stored in a database and is mapped by the xmlTag eg.:
| tokenname| querystring
| "author" | "xmldata.query(/book/author/text())"
the values are presented by the Requesting user. so if the user asks for the author "Peter" with operator "EQUALS" the application constructs:
xmaldata.query(/book/author/text()) = "Peter"
as where clause.
If the customer now decides that author needs to be nested in an <authors> element, i can simply change the expression in the construction-database and the whole machine keeps running without any changes to code, simply manageable.
So i need a way to achieve that
<xPath> <operator> "Peter"
or any other combination of this three isolated components (see above: "Peter" IN <xPath>...) gets me all of Peters' books, even if there are multiple unsorted authors.
This would not suffice either (its not sqlserver syntax, but you get the idea):
WHERE xmldata.exist('/dossier/client[text() = "$1"]', "Peter") = 1;
because the operator is still nested in the expression, i could not request <> "Peter".
I know this is strange, please don't question the concept as a whole - it has a history :/
EDIT: further clarification:
The filter-rules come into the app in an XML structure basically:
Operator: "EQ"
field: "name"
value "Peter"
evaluates to:
expression = lookupExpressionForField("name") --> "table2.xmldata.value('book/author/name[1]', 'varchar')"
operator = lookUpOperatorMapping("EQ") --> "="
value = FormatValues("Peter") --> "Peter" (if multiple values are passed FormatValues cosntructs a comma seperated list)
the application then builds:
- constructClause(String expression,String operator,String value)
"table2.xmldata.value('book/author/name[1]', 'varchar')" + "=" + "Peter"
then constructs a Select statement with the result as WHERE clause.
it does not build it like this, unescaped, unfiltered for injection etc, but this is the basic idea.
i can influence how the input is Transalted, meaning I can implement the methods:
lookupExpressionForField(String field)
lookUpOperatorMapping(String operator)
Formatvalues(List<String> values) | Formatvalues(String value)
constructClause(String expression,String operator,String value)
however i choose to do, i can change the parameter types, I can freely implement them. The less the better of course. So simply constructing a comma-seperated list with xPath would be optimal (like if i could somewhere just tick "enable /function()-syntax in xPath" in sqlserver and the /concat(if...) would work)
How about something like this:
SET NOCOUNT ON;
DECLARE #Books TABLE (ID INT NOT NULL IDENTITY(1, 1) PRIMARY KEY, BookInfo XML);
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>a nice Novel</title>
<author>Maria</author>
<author>Peter</author>
</book>');
INSERT INTO #Books (BookInfo)
VALUES (N'<book>
<title>another one</title>
<author>Bob</author>
</book>');
SELECT *
FROM #Books bk
WHERE bk.BookInfo.exist('/book/author[text() = "Peter"]') = 1;
This returns only the first "book" entry. From there you can extract any portion of the XML field using the "value" function.
The "exist" function returns a boolean / BIT. This will scan through all "author" nodes within "book", so there is no need to concat into a comma-separated list only for use in an IN list, which wouldn't work anyway ;-).
For more info on the "value" and "exist" functions, as well as the other functions for use with XML data, please see:
xml Data Type Methods

NVarchar Prefix causes wrong index to be selected

I have an entity framework query that has this at the heart of it:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
The execution plan for this chooses an index that is a composite of three columns. This takes 10 to 12 seconds.
However, there is an index that is just [SomeOtherOrderId] with a few other columns in the "include". That is the index that should be used. And when I run the following queries it is used:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = CAST(N'SOME VALUE HERE' AS VARCHAR(200))
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = 'SOME VALUE HERE'
This returns instantly. And it uses the index that is just SomeOtherOrderId
So, my problem is that I can't really change how Entity Framework makes the query.
Is there something I can do from an indexing point of view that could cause the correct index to be selected?
As far as I know, since version 4.0, EF doesn't generate unicode parameters for non-unicode columns. But you can always force non-unicode parameters by DbFunctions.AsNonUnicode (prior to EF6, DbFunctions is EntityFunctions):
from o in db.WidgetOrder
where o.SomeOtherOrderId == DbFunctions.AsNonUnicode(param)
select o
Try something like ....
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets WITH (INDEX(Target_Index_Name))
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
This query hint sql server explicitly what index to use to get resutls.

SQL XML query assistance

I've got a table in a SQL Server 2008 database with an nvarchar(MAX) column containing XML data. The data represents search criteria. Here's what the XML looks like for search criteria with one top-level "OR" group containing one single criterion and a nested two-criterion "AND" group.
<?xml version="1.0" encoding="utf-16"?>
<SearchCriterionGroupArgs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SingleCriteria>
<SearchCriterionSingleArgs>
<Operator>Equals</Operator>
<Value>test</Value>
<FieldIDs>
<int>1026</int>
<int>478</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
</SingleCriteria>
<GroupCriteria>
<SearchCriterionGroupArgs>
<SingleCriteria>
<SearchCriterionSingleArgs>
<Operator>GreaterThan</Operator>
<Value>2010-01-23</Value>
<FieldIDs>
<int>1017</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
<SearchCriterionSingleArgs>
<Operator>LessThan</Operator>
<Value>2013-01-23</Value>
<FieldIDs>
<int>1018</int>
</FieldIDs>
<EntityID>92</EntityID>
</SearchCriterionSingleArgs>
</SingleCriteria>
<GroupCriteria />
<EntityID>92</EntityID>
<LogicalOperator>AND</LogicalOperator>
</SearchCriterionGroupArgs>
</GroupCriteria>
<EntityID>92</EntityID>
<LogicalOperator>OR</LogicalOperator>
</SearchCriterionGroupArgs>
Given a an input set of FieldID values, I need to search the table to find if there are any records whose search criteria refer to one of those values (these are represented in the "int" nodes under the "FieldIDs" nodes.)
By running this query:
select CAST(OptionalConditions as xml).query('//FieldIDs')
from tblMyTable
I get the results:
<FieldIDs>
<int>1026</int>
<int>478</int>
</FieldIDs>
<FieldIDs>
<int>1017</int>
</FieldIDs>
<FieldIDs>
<int>1018</int>
</FieldIDs>
(currently there's only one record in the table with xml data in it.)
But I'm just getting started with this stuff and I don't know what the notation would be to check those lists for the existence of any of an arbitrary set of FieldIDs. I don't need to retrieve any particular nodes, just true or false for whether the input field IDs are referenced anywhere in the search.
Thanks for your help!
Edit: using Ranon's solution, I got it working using a query like this:
SELECT *
FROM myTable
WHERE CAST(OptionalConditions as xml).exist('//FieldIDs/int[.=(1019,111,1018)]') = 1
Fetch all FieldIDs and compare them with the set id IDs to check against. XQuery's =-operator compares in a set-based semantics, so if one of the IDs on the left side equal on one the right, this expression will evaluate to true.
//FieldIDs/int = (42, 478)
As "478" is a FieldID, this query will return true. "42" is one not available.
I'm not sure about whether you will be able to cast the result to some sql-server-boolean-type as I haven't got one running, but you will easily be able to try out yourself.
If you're also interested in the nodes contained, you could use this query:
//FieldIDs/int[. = (42,478)]

How to use FULL-TEXT SEARCH in H2 Database?

Consider the following example
CREATE ALIAS IF NOT EXISTS FT_INIT FOR "org.h2.fulltext.FullText.init";
CALL FT_INIT();
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR);
INSERT INTO TEST VALUES(1, 'Hello World');
CALL FT_CREATE_INDEX('PUBLIC', 'TEST', NULL);
and i have executed the following query
SELECT * FROM FT_SEARCH('Hello', 0, 0);
But this query is returning "PUBLIC"."TEST" WHERE "ID"=1 .
Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search
Yes, each row in a query using FT_SEARCH represents a schema-table-row where one of the key words was found. The search is case insensitive, and the text parameter to FT_SEARCH may include more than one word. For example,
DELETE FROM TEST;
INSERT INTO TEST VALUES(1, 'Hello World');
INSERT INTO TEST VALUES(2, 'Goodbye World');
INSERT INTO TEST VALUES(3, 'Hello Goodbye');
CALL FT_REINDEX();
SELECT * FROM FT_SEARCH('hello goodbye', 0, 0);
returns only row three:
QUERY SCORE
"PUBLIC"."TEST" WHERE "ID"=3 1.0
Also note that FT_SEARCH_DATA may be used to retrieve the data itself. For example,
SELECT T.* FROM FT_SEARCH_DATA('hello', 0, 0) FT, TEST T
WHERE FT.TABLE='TEST' AND T.ID=FT.KEYS[0];
returns both rows containing the keyword:
ID NAME
1 Hello World
3 Hello Goodbye
Apache Lucene supports wildcard searches, although leading wildcards (e.g. *ell) tend to be expensive.
Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
Yes, except if you use a join as described by trashgod. The reason is: usually rows are much larger than just two words. For example, a row contains a CLOB with a document. If the result of the fulltext search would contain the data, then fulltext search would be much slower.
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search
The native fulltext search can't do that directly. The reason is: fulltext search only indexes whole words. (By the way: does Google support searches if you only know a part of a word? Apache Lucene does support it) Actually, for H2, there would be a way: first, search the words table (FT.WORDS) for matches, and then use a regular search.

Resources