issue with MSSQL fulltext search - sql-server

original table contains json, but i've stripped it down to the table below:
id
json
1
"name":"one.it.two"
2
"name": "one.it.two"
difference between the two rows is the space after :
catalog has no stopwords.
searching for CONTAINS (json, 'it') return both rows.
searching for CONTAINS (json, 'two') return both rows.
searching for CONTAINS (json, 'one') returns only the second row.
why does searching for one not return the first row?
i've reduced the test case even further. thanks to #RobinWebb
this is no more a json or delimited text issue.
id
text1
1
name:first.it
2
name: first.it
difference between the two rows is the space after :
searching for first does not return the first row.
search works if i change first.it to first.and
thanks to #AlwaysLearning, this is an issue with the word breaker
results from sys.dm_fts_parser is not consistent:
text
words
name:first.it
name:first.itname:firstit
name:first.and
namefirst.andfirstand
name:first,it
namefirstit
i used SELECT * FROM sys.dm_fts_parser ('"<text>"', 1033, NULL, 0)

Based on the info provided in this answer https://dba.stackexchange.com/a/65845/94130
it seems that .it is treated as a special word (possibly a top level domain) by word breaker.
I can only infer that this "special word logic" has a b̶u̶g̶ feature in it, where : is treated as part of the name. Examples:
SELECT * FROM sys.dm_fts_parser (' "name:first.it" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.net" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.com" ', 1033, 0, 0);
SELECT * FROM sys.dm_fts_parser (' "name:first.gov" ', 1033, 0, 0);
Notice that it always returns one extra result. I assume it includes the extra line when it thinks that part of the string is a URL.
special_term
display_term
expansion_type
source_term
Exact Match
name:first.gov
0
name:first.gov
Exact Match
name
0
name:first.gov
Exact Match
:first
0
name:first.gov
Exact Match
gov
0
name:first.gov
Note that some words are not affected:
SELECT * FROM sys.dm_fts_parser (' "name:first.he" ', 1033, 0, 0);
I have modified code provided in https://dba.stackexchange.com/a/25848/94130
to get all characters that are treated this way.
declare #i integer
declare #cnt integer
set #i=0
while #i<255
begin
set #cnt=0
select #cnt=COUNT(1) FROM sys.dm_fts_parser ('"name'+REPLACE(CHAR(#i),'"','""')+'first.net"', 1033, 0, 0)
WHERE display_term = CHAR(#i) + 'first'
if #cnt=1
begin
print 'this char - '+CASE WHEN #i > 31 THEN char(#i) ELSE '' END+' - char('+convert(varchar(3),#i)+') is included'
end
set #i=#i+1
end
Output:
this char - - char(0) is included
this char - : - char(58) is included
this char - ­ - char(173) is included
There is a Microsoft article explaining how to switch word breakers/stemmers, which may (or may not) solve this but I have not tried this.
Note: Above code was executed on Win 11 and SQL 2019 Dev

Related

SQL Server 2014: Invalid length parameter passed to the LEFT or SUBSTRING function

I am getting the following error while trying to execute the query shown in SQL Server 2014. I have data customers chat data and I want to replace the customers with "Customer" and Agent name with "Agent"
Error :- Invalid length parameter passed to the LEFT or SUBSTRING function.
Data format:
11:35:41 Daniella Sichtman : I don't mind. It's ok
11:35:55 Daniella Sichtman : Did you understand my problem?
11:36:09 Madan : Yes, I got your issue.
11:36:20 Madan : Please stay connected while I check what best I can do for you.
11:37:01 Daniella Sichtman : OK. If I may suggest. Mail the hotel that we need 2 nights. I have their contact information if you need that.
11:37:21 Daniella Sichtman : The room are availible they told us
11:37:41 Daniella Sichtman : Just need an ok
11:37:43 Madan : Have you visited the hotel reception to extend your stay ?
11:38:01 Daniella Sichtman : Yes. They told us you need to give the ok
11:39:14 Madan : Nico, I would like to Inform you that we have already authorized to the hotel to extend the stay for our guests.
11:39:46 Daniella Sichtman : They don't know about that or did you told them this morning?
SQL Code:-
SELECT REPLACE(Transcript,
SUBSTRING(SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript))),
1,
CASE
WHEN CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) = 0 THEN LEN(Transcript)
ELSE CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) - 2
END),
'Customer')
FROM [easyJet_Staging].[dbo].[KANA_Chat_Transcript_StageAll]
WHERE Transcript IS NOT NULL
AND Transcript <> '';
Note: I have seen many post but didn't get the desired output. I would request to all of you, could you help me.
Expected output:-
XX:XX:XX Agent : I do understand what you are saying but I am afraid, but any instrument larger than XXcm x XXXcm x XXcm like a double bass or harp can’t be taken on board the aircraft as cabin baggage.
XX:XX:XX Customer : Hello, it's a GUITAR. it is just X cm bigger, the case is Xcm thicker.
XX:XX:XX Customer : it's a soft case that can be pushed to fit your dimensions
XX:XX:XX Agent : I do understand what you are saying but even it is X cm then also they will ask you to put it on hold and we do understand that you are worried if it gets damaged but there won't be any case and you can get the fragile tag from the Airport so that it will be taken care.
XX:XX:XX Customer : What if I take it on board (priority boarding) and it fits within their dimensions exactly and can go in an overhead locker?
SUBSTRING spec indicates that it will accept any int for the the second argument, but must have a positive int for the third. So the problem is probably in the third parameter. ( https://learn.microsoft.com/en-us/sql/t-sql/functions/substring-transact-sql?view=sql-server-2017 )
Sounds like you have records in your data that don't fit what you are expecting: Try the query to identify them. It looks for any records where the third parameter of the substring is zero or negative.
Because of the way SQL works, sometimes functions in the select will be applied to records before they are filtered out using the where clause. It is safest to make sure your select will work on all records, even the ones that will be filtered out in the where clause.
SELECT *
FROM
[easyJet_Staging].[dbo].[KANA_Chat_Transcript_StageAll]
WHERE (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)) <= 0
OR CASE
WHEN CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) = 0 THEN LEN(Transcript)
ELSE CHARINDEX(':', SUBSTRING(transcript, CHARINDEX(' ', transcript) + 1, (((LEN(transcript)) - CHARINDEX(':', REVERSE(transcript))) - CHARINDEX(' ', transcript)))) - 2
END <=0

Query fails on "converting character string to smalldatetime data type"

I've been tasked with fixing some SQL code that doesn't work. The query reads from a view against a predicate. The query right now looks like so.
SELECT TOP (100) Beginn
FROM V_LLAMA_Seminare
//Removal of the following line makes the query successful, keeping it breaks it
where Beginn > (select cast (getdate() as smalldatetime))
order by Beginn desc
When I run the above query, I am greeted with the following error.
Msg 295, Level 16, State 3, Line 1
Conversion failed when converting character string to smalldatetime data type.
I decided to remove the WHERE clause, and now it runs returning 100 rows.
At first, I thought that behind the scenes, SQL Server was somehow including my predicate when bringing back the View . But then I investigated how the View was being created, especially the Beginn field, and at no point does it return a String.
Long story short, the column that becomes the Beginn field is a BIGINT timestamp like 201604201369.... The original user transforms this BIGINT to a smalldatetime using the following magic.
....
CASE WHEN ma.datum_dt = 0
THEN null
ELSE CONVERT(smalldatetime, SUBSTRING(CAST(ma.datum_dt AS varchar(max)),0,5) + '-' +
SUBSTRING(CAST(ma.datum_dt AS varchar(max)),5,2) + '-' +
SUBSTRING(CAST(ma.datum_dt AS varchar(max)),7,2) + ' ' +
SUBSTRING(CAST(ma.datum_dt AS varchar(max)),9,2) +':'+
SUBSTRING(CAST(ma.datum_dt AS varchar(max)),11,2) +':' +
RIGHT(CAST(ma.datum_dt AS varchar(max)),2)) END AS Beginn
...
My last attempt at finding the problem was to query the view and run the function ISDATE over the Beginn column and see if it returned a 0 which it never did.
So my question is two fold, "Why does a predicate break something" and two "Where on earth is this string error coming from when the Beginn value is being formed from a BIGINT".
Any help is greatly appreciated.
This problem is culture related...
Try this and then change the first SET LANGUAGE to GERMAN
SET LANGUAGE ENGLISH;
DECLARE #bi BIGINT=20160428001600;
SELECT CASE WHEN #bi = 0
THEN null
ELSE CONVERT(datetime, SUBSTRING(CAST(#bi AS varchar(max)),0,5) + '-' +
SUBSTRING(CAST(#bi AS varchar(max)),5,2) + '-' +
SUBSTRING(CAST(#bi AS varchar(max)),7,2) + ' ' +
SUBSTRING(CAST(#bi AS varchar(max)),9,2) +':'+
SUBSTRING(CAST(#bi AS varchar(max)),11,2) +':' +
RIGHT(CAST(#bi AS varchar(max)),2)) END AS Beginn
It is a very bad habit to think, that date values look the same everywhere (Oh no, my small application will never go international ...)
Try to stick to culture independent formats like ODBC or ISO
EDIT
A very easy solution for you actually was to replace the blank with a "T"
SUBSTRING(CAST(ma.datum_dt AS varchar(max)),7,2) + 'T' +
Then it's ISO 8601 and will convert...
The solution was found after looking through #Shnugo's comment. When I took my query which contained the Bigint->Datetime conversion logic, and put it into a CTE with "TOP 100000000" to avoid any implicit conversion actions, my query worked. Here is what my view looks like now with some unimportant parts omitted.
---Important part---
CREATE VIEW [dbo].[V_SomeView] AS
WITH CTE AS (
SELECT TOP 1000000000 ma.id AS MA_ID,
---Important part---
vko.extkey AS ID_VKO,
vko.text AS Verkaufsorganisation,
fi.f7000 AS MDM_Nr,
vf.f7105 AS SAPKdnr,
CASE WHEN ma.datum_dt = 0 --Conversion logic
CASE WHEN ma.endedatum_dt = 0 --Conversion logic
CONVERT(NVARCHAR(MAX),art.text) AS Art,
.....
FROM [ucrm].[dbo].[CRM_MA] ma,
[ucrm].[dbo].[CRM_fi] fi,
[ucrm].[dbo].[CRM_vf] vf,
[ucrm].[dbo].[CRM_ka] vko,
[ucrm].[dbo].[CRM_ka] art,
[ucrm].[dbo].[CRM_ka] kat
where ma.loskz = 0
and fi.loskz = 0
and vf.loskz = 0
and fi.F7029 = 0
and vf.F7023 = 0
...
GROUP BY ma.id,
vko.extkey,
vko.text,
fi.f7000 ,
vf.f7105,
ma.datum_dt,
ma.endedatum_dt,
....
)
select * FROM CTE;

MS SQL REPLACE based on 1 character to the left of the $

I am not a SQL expert so please forgive me if this is SQL 101 :).
In a select statement there are 2 replace functions. They look for a Servername and it's admin share d$ by it's UNC path. Example '\SERVERNAME\d$'
It then replaces '\SERVERNAME\d$' with 'D:'.
Here is the query currently:
select Replace(p.Path,'\\SERVERNAME\d$','D:') as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
Replace(p.Path,'\\SERVERNAME\d$','D:') Like s.SharePath + '\%'
Up until now it has always been d$.
Today my needs have changed and I need the query to find ANY servername UNC path admin share regardless of share letter (c$, d$, e$, f$...etc) and replace it with it's respective drive letter (D:, E:, F:... etc).
My thought is replace function could find the $ and look one character to the left of it to get the proper share letter, then use that for the replace. The issue I have, not being a SQL professional, is that I know SQL can likley do what I need it to do...I just don't know how to get there. I've googled and found some examples, but haven't had any luck in getting them to work.
Any help would be greatly appreciated.
You can use a combination of STUFF, PATINDEX, LEN to get what you want.
Sample Query
DECLARE #ReplaceChar VARCHAR(100) = '[prefixcharacters]\\SERVERNAME\d$[postcharacter]'
DECLARE #SearchString VARCHAR(100) = '\\SERVERNAME\_$'
SELECT
STUFF(#ReplaceChar,PATINDEX('%' + #SearchString + '%',#ReplaceChar),LEN(#SearchString),
UPPER(SUBSTRING(#ReplaceChar,PATINDEX('%' + #SearchString + '%',#ReplaceChar) + LEN(#SearchString) - 2,1)) + ':') as searchpath
WHERE PATINDEX('%' + #SearchString + '%',#ReplaceChar) > 0
Output
[prefixcharacters]D:[postcharacter]
Alternate Query
You can shorten the query if you want to get the previous character before $ as per your title. Something like this
DECLARE #ReplaceChar VARCHAR(100) = '[prefixcharacters]\\SERVERNAME\d$[postcharacter]'
DECLARE #SearchString VARCHAR(100) = '\\SERVERNAME\_$'
SELECT
STUFF(#ReplaceChar,
PATINDEX('%'+#SearchString+'%',#ReplaceChar),
LEN(#SearchString),
UPPER(SUBSTRING(#ReplaceChar,CHARINDEX('$',#ReplaceChar) -1,1)) + ':')
WHERE PATINDEX('%'+#SearchString+'%',#ReplaceChar) > 0
In this query
STUFF replaces your pattern with with the character before $ + ':'
Start of pattern is identified by PATINDEX('%'+#SearchString+'%',#ReplaceChar)
D is identified by getting the charindex of '$' and then getting the previous character using SUBSTRING
What about ¸
select Replace(SUBSTRING(p.path, 14, Len(#spath)-14),'$',':') as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
Replace(SUBSTRING(p.path, 14, Len(#spath)-14),'$',':') Like s.SharePath + '\%
select as searchpath
DECLARE #str nvarchar (100)
SET #str = '\\SERVERNAME\d$'
IF #str LIKE '\\SERVERNAME\_$'
SET #str = UPPER(SUBSTRING(#str, 14, 1)) + ':'
SELECT #str
Starting from previous, something like
select UPPER(SUBSTRING(p.path, 14, 1)) + ':' as searchpath
,p.path as fullpath
,s.ShareName
,s.SharePath
,p.Member
,p.Access
From Paths As p
Left Outer Join Shares as s on
SUBSTRING(p.path, 14, 1) + ':' Like s.SharePath + '\%'
I am no mysql expert either :)
Based on the logic you mentioned in the last part of the question, I have used concat and substring to get to the drive letter in the column.
Hope this helps
select replace(path, concat(substring(path, 1, locate('$', path) - 2), substring(path, locate('$', path) - 1, 1) , '$'), concat(substring(path, locate('$', path) - 1, 1) , ':')) as searchpath ...
The remaining part of the query would be the same.

Get a Substring in SQL between two characters and remove all white spaces

I've a strings such as:
Games/Maps/MapsLevel1/Level 1.swf
Games/AnimalWorld/Animal1.1/Level 1.1.swf
Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf
I want to get only file name without its extension(String After last Slash and before Last dot), i.e Level 1 and Level 1.1 and Level 13.5, Even I want to remove all the white spaces and the final string should be in lower case i.e the final output should be
level1
level1.1
level13.5 and so on..
I tried following query but i got Level 1.swf, How do i change this Query?
SELECT SUBSTRING(vchServerPath, LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2, LEN(vchServerPath)) FROM Games
SELECT (left((Path), LEN(Path) - charindex('.', reverse(Path))))
FROM
(
SELECT SUBSTRING(vchServerPath,
LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2,
LEN(vchServerPath)) Path
FROM Games
) A
This would work, I kept your inner substring which got you part way and I added the stripping of the dot.
I have included a sql fiddle link for you to see it in action sql fiddle
Edited:
Following will remove the white space and returns lower case...
SELECT REPLACE(LOWER((left((Path), LEN(Path) - charindex('.', reverse(Path))))), ' ', '')
FROM
(
SELECT SUBSTRING(vchServerPath,
LEN(vchServerPath) - CHARINDEX('/', REVERSE(vchServerPath)) + 2,
LEN(vchServerPath)) Path
FROM Games
) A
Try this:
select
case
when vchServerPath is not null
then reverse(replace(substring(reverse(vchServerPath),charindex('.',reverse(vchServerPath))+1, charindex('/',reverse(vchServerPath))-(charindex('.',reverse(vchServerPath))+1)),' ',''))
else ''
end
This should work fine; with extension removed.
select
REVERSE(
SUBSTRING(
reverse('Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf'),
5,
(charindex('/',
reverse('Games/patterns and spatial understanding/Level 13.5/Level 13.5.swf')) - 5)
))

How to do hit-highlighting of results from a SQL Server full-text query

We have a web application that uses SQL Server 2008 as the database. Our users are able to do full-text searches on particular columns in the database. SQL Server's full-text functionality does not seem to provide support for hit highlighting. Do we need to build this ourselves or is there perhaps some library or knowledge around on how to do this?
BTW the application is written in C# so a .Net solution would be ideal but not necessary as we could translate.
Expanding on Ishmael's idea, it's not the final solution, but I think it's a good way to start.
Firstly we need to get the list of words that have been retrieved with the full-text engine:
declare #SearchPattern nvarchar(1000) = 'FORMSOF (INFLECTIONAL, " ' + #SearchString + ' ")'
declare #SearchWords table (Word varchar(100), Expansion_type int)
insert into #SearchWords
select distinct display_term, expansion_type
from sys.dm_fts_parser(#SearchPattern, 1033, 0, 0)
where special_term = 'Exact Match'
There is already quite a lot one can expand on, for example the search pattern is quite basic; also there are probably better ways to filter out the words you don't need, but it least it gives you a list of stem words etc. that would be matched by full-text search.
After you get the results you need, you can use RegEx to parse through the result set (or preferably only a subset to speed it up, although I haven't yet figured out a good way to do so). For this I simply use two while loops and a bunch of temporary table and variables:
declare #FinalResults table
while (select COUNT(*) from #PrelimResults) > 0
begin
select top 1 #CurrID = [UID], #Text = Text from #PrelimResults
declare #TextLength int = LEN(#Text )
declare #IndexOfDot int = CHARINDEX('.', REVERSE(#Text ), #TextLength - dbo.RegExIndexOf(#Text, '\b' + #FirstSearchWord + '\b') + 1)
set #Text = SUBSTRING(#Text, case #IndexOfDot when 0 then 0 else #TextLength - #IndexOfDot + 3 end, 300)
while (select COUNT(*) from #TempSearchWords) > 0
begin
select top 1 #CurrWord = Word from #TempSearchWords
set #Text = dbo.RegExReplace(#Text, '\b' + #CurrWord + '\b', '<b>' + SUBSTRING(#Text, dbo.RegExIndexOf(#Text, '\b' + #CurrWord + '\b'), LEN(#CurrWord) + 1) + '</b>')
delete from #TempSearchWords where Word = #CurrWord
end
insert into #FinalResults
select * from #PrelimResults where [UID] = #CurrID
delete from #PrelimResults where [UID] = #CurrID
end
Several notes:
1. Nested while loops probably aren't the most efficient way of doing it, however nothing else comes to mind. If I were to use cursors, it would essentially be the same thing?
2. #FirstSearchWord here to refers to the first instance in the text of one of the original search words, so essentially the text you are replacing is only going to be in the summary. Again, it's quite a basic method, some sort of text cluster finding algorithm would probably be handy.
3. To get RegEx in the first place, you need CLR user-defined functions.
It looks like you could parse the output of the new SQL Server 2008 stored procedure sys.dm_fts_parser and use regex, but I haven't looked at it too closely.
You might be missing the point of the database in this instance. Its job is to return the data to you that satisfies the conditions you gave it. I think you will want to implement the highlighting probably using regex in your web control.
Here is something a quick search would reveal.
http://www.dotnetjunkies.com/PrintContent.aspx?type=article&id=195E323C-78F3-4884-A5AA-3A1081AC3B35
Some details:
search_kiemeles=replace(lcase(search),"""","")
do while not rs.eof 'The search result loop
hirdetes=rs("hirdetes")
data=RegExpValueA("([A-Za-zöüóőúéáűíÖÜÓŐÚÉÁŰÍ0-9]+)",search_kiemeles) 'Give back all the search words in an array, I need non-english characters also
For i=0 to Ubound(data,1)
hirdetes = RegExpReplace(hirdetes,"("&NoAccentRE(data(i))&")","<em>$1</em>")
Next
response.write hirdetes
rs.movenext
Loop
...
Functions
'All Match to Array
Function RegExpValueA(patrn, strng)
Dim regEx
Set regEx = New RegExp ' Create a regular expression.
regEx.IgnoreCase = True ' Set case insensitivity.
regEx.Global = True
Dim Match, Matches, RetStr
Dim data()
Dim count
count = 0
Redim data(-1) 'VBSCript Ubound array bug workaround
if isnull(strng) or strng="" then
RegExpValueA = data
exit function
end if
regEx.Pattern = patrn ' Set pattern.
Set Matches = regEx.Execute(strng) ' Execute search.
For Each Match in Matches ' Iterate Matches collection.
count = count + 1
Redim Preserve data(count-1)
data(count-1) = Match.Value
Next
set regEx = nothing
RegExpValueA = data
End Function
'Replace non-english chars
Function NoAccentRE(accent_string)
NoAccentRE=accent_string
NoAccentRE=Replace(NoAccentRE,"a","§")
NoAccentRE=Replace(NoAccentRE,"á","§")
NoAccentRE=Replace(NoAccentRE,"§","[aá]")
NoAccentRE=Replace(NoAccentRE,"e","§")
NoAccentRE=Replace(NoAccentRE,"é","§")
NoAccentRE=Replace(NoAccentRE,"§","[eé]")
NoAccentRE=Replace(NoAccentRE,"i","§")
NoAccentRE=Replace(NoAccentRE,"í","§")
NoAccentRE=Replace(NoAccentRE,"§","[ií]")
NoAccentRE=Replace(NoAccentRE,"o","§")
NoAccentRE=Replace(NoAccentRE,"ó","§")
NoAccentRE=Replace(NoAccentRE,"ö","§")
NoAccentRE=Replace(NoAccentRE,"ő","§")
NoAccentRE=Replace(NoAccentRE,"§","[oóöő]")
NoAccentRE=Replace(NoAccentRE,"u","§")
NoAccentRE=Replace(NoAccentRE,"ú","§")
NoAccentRE=Replace(NoAccentRE,"ü","§")
NoAccentRE=Replace(NoAccentRE,"ű","§")
NoAccentRE=Replace(NoAccentRE,"§","[uúüű]")
end function

Resources