SQL Server oddity with full-text indexing and case sensitivity and numbers

SQL Server oddity with full-text indexing and case sensitivity and numbers - sql-server

This problem may be unique to our server, but I can't tell from the symptoms where the issue may lie.
I have a field (searchtitle) in a table (fsItems) that has a full-text index turned ON. For the record with primary key (IDItem) 704629 the content of this field is "TEST AFA 0 TEST".
Surprisingly, the following query returns no results:
SELECT * FROM fsItems WHERE CONTAINS(searchtitle,'AFA') AND IDItem = 704629
However, if I change the content to be "TEST afa 0 TEST" or "TEST AFA O TEST" (capital "O" instead of zero) the query returns the record. (It also returns the record if I change the content to "TEST AFB 0 TEST" and the CONTAINS argument to 'AFB'.)
At first I thought maybe AFA was some kind of stop word, but that wouldn't explain why changing zero to upper-case "O" returns the proper result.
Any idea what is going on here?
Thanks for any suggestions

Very interesting little quirk. It appears SQL Server is considering "AFA 0" as a single "word". My guess is this is an issue with the word breakers configured for standard English. It appears you can manually adjust them, but it doesn't look simple or intuitive. See Microsoft's how to documentation here
Identifying Words in Full-text Index
The below script lists every word in a full text index. If you run this against your table, you'll see in column display_term word "AFA 0". Side note: this script is also very useful in optimizing full text indexes by identifying "noisy" words to add to your stop list
Select *
From sys.dm_fts_index_keywords(Db_Id(),Object_Id('dbo.tbl_fulltext_test') /*Replace with your table name*/)
Order By document_count Desc
Full SQL Used to Identify the Issue
CREATE TABLE tbl_fulltext_test
(ID int constraint PK_fulltext_test primary key identity (1,1)
,String Varchar(1000)
)
Create Fulltext Catalog ct_test
With Accent_Sensitivity = Off
Create Fulltext Stoplist sl_test
From System Stoplist;
Create Fulltext Index On tbl_fulltext_test(String)
Key Index PK_fulltext_test On ct_test
With Stoplist = sl_test,Change_Tracking Auto;
INSERT INTO tbl_fulltext_test
VALUES
('TEST AFA 0 TEST') /*Zero*/
,('TEST afa 0 TEST') /*Zero*/
,('TEST AFB 0 TEST') /*AFB*/
,('TEST AFA O TEST') /*Letter O*/
/*Returns rows 2 and 4*/
SELECT *
FROM tbl_fulltext_test
WHERE CONTAINS (String,'AFA')
/*Returns row 1*/
SELECT *
FROM tbl_fulltext_test
WHERE CONTAINS (String,'"AFA 0"')

Related

How to include ToUpper() INSIDE query condition?

I have this query that is supposed to get all results where active=TRUE in table, but i want to ensure if users lets say changed the active value in the Table to say "True" or "tRue" that the query recognizes it as intended TRUE, by somehow applying Uppercase ON the query condition all the time
$Table = Query "SELECT * from [dbo].[$cubeTable] WHERE [active] = 'TRUE'.ToUpper()"
write-host $Table += $row.Item("active")
Notice, this is what I have but of course, it throws error
WHERE [active] = 'TRUE'.ToUpper()"
Exception calling "Fill" with "1" argument(s): "Cannot call methods on varchar."

Default collation will provide the functionality you require (case-insensitive match) by default, however if you want to force an insensitive match if your collation is case sensitive then you can use UPPER function but be aware this will cause a full scan of your table (and can therefore have some major performance implications).
To check your current collation:
SELECT DATABASEPROPERTYEX('DbName', 'Collation') SQLCollation;
This will give you the collation, example Latin1_General_CI_AS
Here CI means Case Insensitive
If you have this then you are good to go. If not you could do:
SELECT * from [dbo].[$cubeTable] WHERE UPPER([active]) = 'TRUE'
But this will scan your whole table.
If you have the ability to change your schema you could force the collation for the column if you want:
CREATE TABLE [#CollationTest]
(
[MyColumnName] VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS
)
INSERT INTO [#CollationTest] ([MyColumnName]) VALUES ('Value123')
INSERT INTO [#CollationTest] ([MyColumnName]) VALUES ('value123')
SELECT * FROM [#CollationTest] WHERE [MyColumnName] = 'value123' -- Returns 2 rows
DROP TABLE [#CollationTest]

Strange Behaviour on MSSQL Stored Procedure using Conditional WHERE with CONTAINS (Full Text Index)

I need some help from a MS SQL Master...
Short version:
When I execute a Conditional Where followed by a Contains, my query delays 1 minute (In its normal execution, it takes 200 milliseconds).
With this query, everything works fine:
Where
Contains(table.product_name, #search_word)
But with a Conditional Where, it takes 1 minute to execute:
Where
(#ExecuteWhereStatement = 0 Or (Contains(table.product_name, #search_word))
Long Version:
I'm using a stored procedure that receives some parameters. This Stored Procedure query a really large table, but everything is indexed properly and the query goes very well so far.
The main query is a little big, so I want to make the WHERE clause more smart possible, to avoid repeat multiple times the same statement.
The whole idea of the DataBase, is a history of purchases made by the State. So this query involves 3 tables:
Table 1 (table_purchase) - The purchase itself
id_purchase int (PK)
date_purchase datetime
buyer_code int (Nullable)
Table 2 (table_purchase_product) - The Items of a Purchase
id_product int (PK)
id_purchase int (FK of table_purchase)
product_quantity int (Nullable)
product_name varchar(255) (Nullable) (Full-Text-Indexed)
product_description varchar(2000) (Nullable) (Full-Text-Indexed)
id_product_bid_winner int (FK of table_product_bid)
Table 3 (table_product_bids) - The Bids for Each product of a Purchase
id_product_bid int (PK)
id_product int (FK of table_purchase_product)
product_brand varchar(255) (Nullable) (Full-Text-Indexed)
bid_value decimal (20,6)
So basicly, We have a "Purchase", that has several "Products (or Items)", and each "Product" has some "Bids (or Prices)"
And there is the Bad Girl (The SQL Stored Procedure):
ALTER PROCEDURE [dbo].[procPesquisaFullText]
#search_date datetime,
#search_word varchar(8000),
#search_brand varchar(255),
#only_one_bid bit = 0,
#search_buyer_code int = 0,
#quantityFrom decimal(20,6) = 0,
#quantityTo decimal(20,6) = 0
AS
BEGIN
SET NOCOUNT ON;
Declare #ExecuteWordSearch AS bit;
if (#uasg != 0 And #search_word = '')
begin
Set #ExecuteWordSearch = 0;
Set #search_word = 'nothing';
end
else
begin
Set #ExecuteWordSearch = 1;
end
Declare #ExecuteBrandSearch AS bit;
if (#search_brand = '')
begin
Set #ExecuteBrandSearch = 0;
Set #search_brand = 'nothing';
end
else
begin
Set #ExecuteMarcaSearch = 1;
end
begin
SELECT
pp.id_product,
pp.id_purchase,
pp.description
FROM
table_purchase_product pp
inner join table_purchase p on p.id_purchase = pp.id_purchase
WHERE
(p.date_purchase >= #search_date)
and (#search_buyer_code = 0 or (l.buyer_code = #search_buyer_code))
and (#quantityFrom = 0 or (li.product_quantity >= #QuantityFrom))
and (#quantityTo = 0 or (li.product_quantity <= #QuantityTo))
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
and (#only_one_bid = 0
or ((Select COUNT(*) From table_product_bid Where table_product_bid.id_product = pp.id_product) = 1))
and (#ExecuteBrandSearch = 0 Or (exists(
select 1
from table_product_bid ppb
where ppb.id_product_bid = pp.id_product_bid_winner
and contains(ppb.product_brand, #search_brand)
)
))
ORDER BY p.date_purchase DESC
end
END
So far, so good...
In the beginning I set two variables, used inside the query.
The first, verify if the user specified a "Buyer Code" AND didn't specify a "Search Word" (So, not the Product's description nor the Product's name is verified)
The second, verify if the user specified a "Specific Brand". If so, then the Winning Bid's BRAND is verified to match the users one.
Observation: You'll notice that when the "Search Words" is empty, I set them to "nothing". I do it because if the search term in the Contains is empty, it throws me a exception, even when it's not executed (I tested it in another query, absolutely isolated too)
As You can see, my user is able to search for:
- "Products" of Some Distinct Buyer "Purchase" (passing the #search_buyer_code parameter)
- A "Product" that contains a distinct word in its name or description
- A "Product" that has the Winner Bid of a specific Brand
- A "Product" that has only 1 bid at all
- A "Product" with a maximum and minimum quantity
And You'll notice that I used a lot of Conditions INSIDE the Where, producing a very dynamic Where, instead of using a "BIG If Else" statement, and repeating a lot of code. (I guess some "Googlers" will land here looking for Conditionally Wheres, and If so, I'm glad to help!)
Ok, so everything works veeery great at all. The query executes flawless. But here is the strange, damn, tricky issue:
If I want the user to be able to specify only a "Buyer Code" for Purchase, but No Word to Search of the Product using the code above (which is the first piece of code in the stored procedure does):
Changing from:
and (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word))
To:
and (#ExecuteWordSearch = 0 Or (contains(pp.product_description, #search_word) or contains(pp.product_name, #search_word)))
The query delays near 1 minute! (the execution is about 200 milliseconds for the query above).
But WHY??? I Use the same Logic of in all "Conditionally Wheres". I also use the same logic of having a flag/variable to indicate when execute the Where clause in the Word Search and the Brand Search, but the Brand Search works PERFECTLY! So Why, WHY only when I use the condition followed by a Contains my query delays 1 minute????
And this issue is not related with the amount of data, because I tried removing the entire Contains condition, allowing a lot of data to return, and it takes 1 second maximum...
Ow, It's a Microsoft SQL Server 2008 R2.
Thanks already for You read so far!

I cannot find the documentation I had around a very similar issue, but it sounded so familiar, I at least wanted to share what I remembered. Part of the issue is that for Sql Server, the full-text search engine is separate from the regular query execution engine, and so when you mix the two, in some cases, performance can tank. This is particularly true when the condition is an 'OR' rather than and 'AND'. (I remember hitting this exact situation). Conditional ANDs worked fine. But for OR, it's as if each condition gets evaluated repeatedly row by row.
Among the workarounds, one is, as already suggested, create your sql dynamically before execution.
Another would be to break the full-text and non-full text conditions into two search functions (literally UDF's) and then do whatever is needed (INTERSECT, EXCEPT, etc) with the two resultsets.

Try changing your WHERE clause to use a CASE statement, e.g.:
WHERE
CASE
WHEN #ExecuteWhereStatement = 0 THEN 1
WHEN #ExecuteWhereStatement = 1 THEN
CASE
WHEN CONTAINS([table].product_name, #search_word) THEN 1
ELSE 0
END
END = 1;

Avoid Adding Duplicate Records

I m trying to write if statement to give error message if user try to add existing ID number.When i try to enter existing id i get error .untill here it s ok but when i type another id no and fill the fields(name,adress etc) it doesnt go to database.
METHOD add_employee.
DATA: IT_EMP TYPE TABLE OF ZEMPLOYEE_20.
DATA:WA_EMP TYPE ZEMPLOYEE_20.
Data: l_count type i value '2'.
SELECT * FROM ZEMPLOYEE_20 INTO TABLE IT_EMP.
LOOP AT IT_EMP INTO WA_EMP.
IF wa_emp-EMPLOYEE_ID eq pa_id.
l_count = l_count * '0'.
else.
l_count = l_count * '1'.
endif.
endloop.
If l_count eq '2'.
WA_EMP-EMPLOYEE_ID = C_ID.
WA_EMP-EMPLOYEE_NAME = C_NAME.
WA_EMP-EMPLOYEE_ADDRESS = C_ADD.
WA_EMP-EMPLOYEE_SALARY = C_SAL.
WA_EMP-EMPLOYEE_TYPE = C_TYPE.
APPEND wa_emp TO it_emp.
INSERT ZEMPLOYEE_20 FROM TABLE it_emp.
CALL FUNCTION 'POPUP_TO_DISPLAY_TEXT'
EXPORTING
TITEL = 'INFO'
TEXTLINE1 = 'Record Added Successfully.'.
elseif l_count eq '0'.
CALL FUNCTION 'POPUP_TO_DISPLAY_TEXT'
EXPORTING
TITEL = 'INFO'
TEXTLINE1 = 'Selected ID already in database.Please type another ID no.'.
ENDIF.
ENDMETHOD.

I'm not sure I'm getting your explanation. Why are you trying to re-insert all the existing entries back into the table? You're just trying to insert C_ID etc if it doesn't exist yet? Why do you need all the existing entries for that?
If so, throw out that select and the loop completely, you don't need it. You have a few options...
Just read the table with your single entry
SELECT SINGLE * FROM ztable INTO wa WITH KEY ID = C_ID etc.
IF SY-SUBRC = 0.
"this entry exists. popup!
ENDIF.
Use a modify statement
This will overwrite duplicate entries with new data (so non key fields may change this way), it wont fail. No need for a popup.
MODIFY ztable FROM wa.
Catch the SQL exceptions instead of making it dump
If the update fails because of an exception, you can always catch it and deal with exceptional situations.
TRY .
INSERT ztable FROM wa.
CATCH sapsql_array_insert_duprec.
"do your popup, the update failed because of duplicate records
ENDTRY.

I think there's a bug when appending in internal table 'IT_EMP' and inserting in 'ZEMPLOYEE_20' table.
Suppose you append the first time and then you insert. But when you append the second time you will have 2 records in 'IT_EMP' that are going to be inserted in 'ZEMPLOYEE_20'. That is because you don't refresh or clear the internal table and there you will have a runtime error.
According to SAP documentation on 'Inserting Lines into Tables ':
Inserting Several Lines
To insert several lines into a database table, use the following:
INSERT FROM TABLE [ACCEPTING DUPLICATE KEYS] . This
writes all lines of the internal table to the database table in
one single operation. The same rules apply to the line type of
as to the work area described above. If the system is able to
insert all of the lines from the internal table, SY-SUBRC is set to 0.
If one or more lines cannot be inserted because the database already
contains a line with the same primary key, a runtime error occurs.
Maybe the right direction here is trying to insert the work area directly but before you must check if record already exists using the primary key.
Check the SAP documentation on this issue clicking the link before.
On the other hand, once l_count is zero because of l_count = l_count * '0'. that value will never change to any other number making that you won't append or insert again.

why are you retrieving all entries from zemployee_20 ?
You can directly check wether the 'id' exists already or not by using select single. If exists, then show message, if not, add.
It is recommended to retrieve only one field when its needed and not the entire table with asterisc *
SELECT single employee_id FROM ZEMPLOYEE_20 where employee_id = p_id INTO v_id. ( or field in structure )
if sy-subrc = 0. "exists
"show message
else. "not existing id
"populate structure and then add record to Z table
endif.
Furthermore, l_count is not only unnecessary but also bad implemented.

You can directly use the insert query,if the sy-subrc is unsuccessful raise the error message.
WA_EMP-EMPLOYEE_ID = C_ID.
WA_EMP-EMPLOYEE_NAME = C_NAME.
WA_EMP-EMPLOYEE_ADDRESS = C_ADD.
WA_EMP-EMPLOYEE_SALARY = C_SAL.
WA_EMP-EMPLOYEE_TYPE = C_TYPE.
INSERT ZEMPLOYEE_20 FROM WA_EMP.
If sy-subrc <> 0.
Raise the Exception.
Endif.

NVarchar Prefix causes wrong index to be selected

I have an entity framework query that has this at the heart of it:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
The execution plan for this chooses an index that is a composite of three columns. This takes 10 to 12 seconds.
However, there is an index that is just [SomeOtherOrderId] with a few other columns in the "include". That is the index that should be used. And when I run the following queries it is used:
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = CAST(N'SOME VALUE HERE' AS VARCHAR(200))
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets
WHERE widgets.[SomeOtherOrderId] = 'SOME VALUE HERE'
This returns instantly. And it uses the index that is just SomeOtherOrderId
So, my problem is that I can't really change how Entity Framework makes the query.
Is there something I can do from an indexing point of view that could cause the correct index to be selected?

As far as I know, since version 4.0, EF doesn't generate unicode parameters for non-unicode columns. But you can always force non-unicode parameters by DbFunctions.AsNonUnicode (prior to EF6, DbFunctions is EntityFunctions):
from o in db.WidgetOrder
where o.SomeOtherOrderId == DbFunctions.AsNonUnicode(param)
select o

Try something like ....
SELECT 1 AS dummy
FROM [dbo].[WidgetOrder] AS widgets WITH (INDEX(Target_Index_Name))
WHERE widgets.[SomeOtherOrderId] = N'SOME VALUE HERE'
This query hint sql server explicitly what index to use to get resutls.

How to use FULL-TEXT SEARCH in H2 Database?

Consider the following example
CREATE ALIAS IF NOT EXISTS FT_INIT FOR "org.h2.fulltext.FullText.init";
CALL FT_INIT();
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR);
INSERT INTO TEST VALUES(1, 'Hello World');
CALL FT_CREATE_INDEX('PUBLIC', 'TEST', NULL);
and i have executed the following query
SELECT * FROM FT_SEARCH('Hello', 0, 0);
But this query is returning "PUBLIC"."TEST" WHERE "ID"=1 .
Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search

Yes, each row in a query using FT_SEARCH represents a schema-table-row where one of the key words was found. The search is case insensitive, and the text parameter to FT_SEARCH may include more than one word. For example,
DELETE FROM TEST;
INSERT INTO TEST VALUES(1, 'Hello World');
INSERT INTO TEST VALUES(2, 'Goodbye World');
INSERT INTO TEST VALUES(3, 'Hello Goodbye');
CALL FT_REINDEX();
SELECT * FROM FT_SEARCH('hello goodbye', 0, 0);
returns only row three:
QUERY SCORE
"PUBLIC"."TEST" WHERE "ID"=3 1.0
Also note that FT_SEARCH_DATA may be used to retrieve the data itself. For example,
SELECT T.* FROM FT_SEARCH_DATA('hello', 0, 0) FT, TEST T
WHERE FT.TABLE='TEST' AND T.ID=FT.KEYS[0];
returns both rows containing the keyword:
ID NAME
1 Hello World
3 Hello Goodbye
Apache Lucene supports wildcard searches, although leading wildcards (e.g. *ell) tend to be expensive.

Do i have to again execute this "PUBLIC"."TEST" WHERE "ID"=1 to get the record containing 'Hello' word ?
Yes, except if you use a join as described by trashgod. The reason is: usually rows are much larger than just two words. For example, a row contains a CLOB with a document. If the result of the fulltext search would contain the data, then fulltext search would be much slower.
What is the query to search all records with 'ell' word in them from the FT_Search. such as like %ell% in H2 Native Full-Text Search
The native fulltext search can't do that directly. The reason is: fulltext search only indexes whole words. (By the way: does Google support searches if you only know a part of a word? Apache Lucene does support it) Actually, for H2, there would be a way: first, search the words table (FT.WORDS) for matches, and then use a regular search.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server oddity with full-text indexing and case sensitivity and numbers - sql-server

Related

How to include ToUpper() INSIDE query condition?

Strange Behaviour on MSSQL Stored Procedure using Conditional WHERE with CONTAINS (Full Text Index)

Avoid Adding Duplicate Records

NVarchar Prefix causes wrong index to be selected

How to use FULL-TEXT SEARCH in H2 Database?

Categories

Resources