Searchable text is in 2 tables, how to design full text index? - sql-server

In a forum application, the actual name of the thread is stored in a table, and then replies is stored in another table.
Table_Thread
Subject varchar(255) e.g. "How to setup fulltext search"
Table_Replies (users replies here)
ReplyText text(not null)
Now I want to create a full-text search on both the subject and reply columns, but they seem very related so they should be in the same index.
Is it possible to do this?
I'm using sql server 2005.

Assuming there is an association between the subject and the replies you could create a view WITH SCHEMABINDING, create a UNIQUE CLUSTERED index on the view and then add that view to your fulltext catalog selecting the two columns you want included.

When huge concurrent query requests come, RDBMS cannot afford it by SQL. what's more, select SQL supports full-text search badly. So you need IR (Information Retrieval) library such as Lucene for java.

You could create a indexed view containing an union of both indexed columns + PK of the tables
e.g.
CREATE VIEW SearchText
WITH SCHEMABINDING
AS SELECT * FROM (
(Subject as Text, Table_Thread_ID as ID, 1 as Type FROM Table_Thread)
UNION ALL
(ReplyText as Text, Table_Replies_ID as ID, 2 as Type FROM Table_Replies));
I put type 1 and 2 as arbitrary, since you need a unique key to build a fulltext index.
And then create a unique index on (ID, Type), and finally your fulltext index.
CREATE UNIQUE INDEX SearchText_UK ON SearchText (ID, Type);
CREATE FULLTEXT CATALOG ft AS DEFAULT;
CREATE FULLTEXT INDEX ON SearchText(Text)
KEY INDEX SearchText_UK
WITH STOPLIST = SYSTEM;

I have seen what NopCommerce (C# MVC Open Source E-Commerce) has done using fulltext search on 'products' and 'variants' and only return 'products'. This is very similar to your case because you want to search on 'Thread' and 'Replies' but you obviously want to only return 'threads'. I have change it to use threads and replies for you:
First, create a function that generates an index name by table (optional):
CREATE FUNCTION [dbo].[nop_getprimarykey_indexname]
(
#table_name nvarchar(1000) = null
)
RETURNS nvarchar(1000)
AS
BEGIN
DECLARE #index_name nvarchar(1000)
SELECT #index_name = i.name
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS i ON (i.index_id > 0 and i.is_hypothetical = 0) AND (i.object_id=tbl.object_id)
WHERE (i.is_unique=1 and i.is_disabled=0) and (tbl.name=#table_name)
RETURN #index_name
END
GO
Then, enable fulltext by creating the catalog and the indexes:
EXEC('
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_catalogs WHERE [name] = ''myFullTextCatalog'')
CREATE FULLTEXT CATALOG [myFullTextCatalog] AS DEFAULT')
DECLARE #create_index_text nvarchar(4000)
SET #create_index_text = '
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_indexes WHERE object_id = object_id(''[Table_Thread]''))
CREATE FULLTEXT INDEX ON [Table_Thread]([Subject])
KEY INDEX [' + dbo.[nop_getprimarykey_indexname] ('Table_Thread') + '] ON [myFullTextCatalog] WITH CHANGE_TRACKING AUTO'
EXEC(#create_index_text)
SET #create_index_text = '
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_indexes WHERE object_id = object_id(''[Table_Replies]''))
CREATE FULLTEXT INDEX ON [Table_Replies]([ReplyText])
KEY INDEX [' + dbo.[nop_getprimarykey_indexname] ('Table_Replies') + '] ON [myFullTextCatalog] WITH CHANGE_TRACKING AUTO'
EXEC(#create_index_text)
Then, in the stored procedure to obtain products by keywords, build a temporary table with a list of product Ids that match the keywords.
INSERT INTO #KeywordThreads ([ThreadId])
SELECT t.Id
FROM Table_Thread t with (NOLOCK)
WHERE CONTAINS(t.[Subject], #Keywords)
UNION
SELECT r.ThreadId
FROM Table_Replies r with (NOLOCK)
WHERE CONTAINS(pv.[ReplyText], #Keywords)
Now you can use the temporary table #KeywordThreads to join with the list of threads and return them.
I hope this helps.

Related

Executing dynamic SQL with return value as a column value for each rows

I have a rather simple query that I started to modify in order to remove temp table as we have concurrency issues over many different systems and clients.
Right now the simple solution was to break up the query in multiple separate queries to replicate what SQL was doing before.
I am trying to figure out a way to return the result of a dynamic SQL query as a column value. The new query is quite simple, it look in the system objects for all table with specific format and output. What i am missing is that for each record i need to output the result of a dynamic query on each of those table.
The query :
SELECT [name] as 'TableName'
FROM SYSOBJECTS WHERE xtype = 'U'
AND (CHARINDEX('_PCT', [name]) <> 0
OR CHARINDEX('_WHT', [name]) <> 0)
All these table have a common column called Result which is a float. What i am trying to do is return the count of this column under some WHERE clause that is generic and will work will all tables as well.
A desired query (i know it's not valid) would be :
SELECT [name] as 'TableName',
sp_executesql 'SELECT COUNT(*) FROM ' + [name] + ' WHERE Result > 0 OR (Result < 139 AND CurrentIndex < 15)' as 'ResultValue'
FROM SYSOBJECTS WHERE xtype = 'U'
AND (CHARINDEX('_PCT', [name]) <> 0
OR CHARINDEX('_WHT', [name]) <> 0)
Before it used to be easy. We had a temp table with 2 columns and were filling the table name first. Then we iterate on the temp table and execute the dynamic sql and return the value in an OUTPUT variable and simply update the record of the temp table and finally return the table.
I have tried a scalar function but it doesn't support dynamic SQL so it doesn't work. I would rather not create the 13,000~ different queries for the 13,000~ tables.
I have tried using a reference table and use trigger to update the status but it slow the system way to much. The average tables insert and delete 28 millions records. The original temp table query only took 5-6 minutes to execute due to very good indexing and now we are reaching 25-30 minutes.
Is there any other solution available than Querying the table list then the Client query each table one by one to know it status ?
We are using SQL Server 2017 if some new features are available now
You can use this script for your purpose (tested in SQL Server 2016).
Updated: It should work now as the results are a single set now.
EXEC sp_msforeachtable
#precommand = 'CREATE TABLE ##Statistics
(TableName varchar(128) NOT NULL,
NumOfRows int)',
#command1 ='INSERT INTO ##Statistics (TableName, NumOfRows)
SELECT ''?'' Table_Name, COUNT(*) Row_Count FROM ? WHERE Result > 0 OR (Result < 139 AND CurrentIndex < 15)',
#postcommand = 'SELECT TableName, NumOfRows FROM ##Statistics;
DROP TABLE ##Statistics'
,#whereand = ' And Object_id In (Select Object_id From sys.objects
Where name like ''%_PCT%'' OR name like ''%_WHT%'')'
For more details on sp_msforeachtable Please visit this link

SQL Server full text search on column with alias

I am using full-text search, its working fine on a direct column of table but not on a derived/aliased column.
SELECT ExpectationId
,ExpectationName
,(
CASE
WHEN ExpectationOrganization_OrganizationId IS NOT NULL
THEN (
SELECT OrganizationName
FROM Organizations
WHERE OrganizationId = ExpectationOrganization_OrganizationId
)
WHEN ExpectationBeneficiary_BeneficiaryId IS NOT NULL
THEN (
SELECT BeneficiaryName
FROM Beneficiaries
WHERE BeneficiaryId = ExpectationBeneficiary_BeneficiaryId
)
ELSE (
SELECT TeamName
FROM Teams
WHERE TeamId = ExpectationTeam_TeamId
)
END
) AS ParentName
FROM Expectations
WHERE
FREETEXT(ExpectationName, #Keyword) ---Working
OR FREETEXT(ParentName, #Keyword) ---Not working
All these columns ExpectationName, OrganizationName, BeneficiaryName, TeamName are full-text indexed.
How can I make it work for ParentName column?
You need to create a VIEW based on your query first then add a Full-Text index to it which will include the ParentName column. Without the Full-Text index over a column being searched neither FREETEXT nor CONTAINS will work.
Something like that should help you:
CREATE VIEW ExpectationsView AS
SELECT ExpectationId
,ExpectationName
,(
CASE
WHEN ExpectationOrganization_OrganizationId IS NOT NULL
THEN (
SELECT OrganizationName
FROM Organizations
WHERE OrganizationId = ExpectationOrganization_OrganizationId
)
WHEN ExpectationBeneficiary_BeneficiaryId IS NOT NULL
THEN (
SELECT BeneficiaryName
FROM Beneficiaries
WHERE BeneficiaryId = ExpectationBeneficiary_BeneficiaryId
)
ELSE (
SELECT TeamName
FROM Teams
WHERE TeamId = ExpectationTeam_TeamId
)
END
) AS ParentName
FROM Expectations
GO
-- This index is needed for FTS index.
-- Note, I trust ExpectationId column is unique in your SELECT above,
-- if it's not, the below CREATE INDEX will fail and you will need to provide
-- a new column to your VIEW which will uniquely identify each row, then use
-- that PK-like column in the below index
CREATE UNIQUE CLUSTERED INDEX PK_ExpectationsView
ON ExpectationsView (ExpectationId);
GO
CREATE FULLTEXT CATALOG fts_catalog;
GO
CREATE FULLTEXT INDEX ON ExpectationsView
(
ExpectationName Language 1033,
ParentName Language 1033
)
KEY INDEX PK_ExpectationsView
ON fts_catalog;
WITH (CHANGE_TRACKING = AUTO)
GO
Once the Full-Text index which includes the relevant columns is there you can use FREETEXT or CONTAINS in queries:
SELECT ExpectationId, ExpectationName, ParentName FROM ExpectationsView
WHERE FREETEXT(ExpectationName, #Keyword) OR FREETEXT(ParentName, #Keyword)
Note, the above code I've provided off the top of my head because I don't have data schema for your case so couldn't try running it. However, it should give you the general idea on how to proceed. HTH.

SQL Server index - ideas?

I have this query :
SELECT
c.violatorname
FROM
dbo.crimecases AS c,
dbo.people AS p
WHERE
REPLACE(c.violatorname, ' ', '') = CONCAT(CONCAT(CONCAT(p.firstname, p.secondname), p.thirdname), p.lastname);
The query is very slow, I need to create an index on violatorname column with replace function. Any ideas?
I would suggest you to add computed columns and create index on it.
ALTER TABLE crimecases
ADD violatornameProcessed AS Replace(violatorname, ' ', '') PERSISTED
ALTER TABLE people
ADD fullName AS Concat(firstname, secondname, thirdname, lastname) PERSISTED
Persisted will store the computed data on the disk instead of computing every time. Now create index on it.
CREATE INDEX Nix_crimecases_violatornameProcessed
ON crimecases (violatornameProcessed)
include (violatorname)
CREATE INDEX Nix_people_fullName
ON people (fullName)
Query can be written like
SELECT c.violatorname
FROM dbo.crimecases AS c
INNER JOIN dbo.people AS p
ON c.violatornameProcessed = p.fullName

SQL Server Scripting Partitioning

Had a good look on the net and books online and couldn't find an answer to my question, so here goes.
Working on someone else's design, I have several tables all tied to the same partition schema and partition function. I wish to perform a split operation which would affect many hundreds of millions of rows.
To split is no problem:
ALTER PARTITION SCHEME [ps_Scheme] NEXT USED [FG1] ;
ALTER PARTITION FUNCTION [pfcn_Function]() SPLIT RANGE (20120331)
However, I'm concerned that this will affect many tables at once and is not desirable.
Therefore, I was going to create a new copy of the table and do the split on a new function
CREATE PARTITION FUNCTION [pfcn_Function1](INT)
AS RANGE RIGHT
FOR VALUES
(
20090101, 20090130, 20090131, 20090201...etc
)
CREATE PARTITION SCHEME [ps_Scheme1]
AS PARTITION [pfcn_Function1] TO
([FG1], [FG2] etc
CREATE TABLE [dbo].[myTableCopy]
(
....
) ON ps_Scheme1
Then I would switch the partition I wish to split across:
-- The partition numbers did not align because they are based on 2 different functions.
ALTER TABLE [Table] SWITCH PARTITION 173 TO [TableCopy] PARTITION 172
Finally my question is can this be automated? You can make a copy of the table easily in SQL using SELECT INTO, but I cannot see how to automate the partitioning of the table i.e. the bit on the end of the CREATE TABLE statement that points to the partition scheme.
Thanks for any responses.
Found this on books online:
You can turn an existing nonpartitioned table into a partitioned table in one of two ways.
One way is to create a partitioned clustered index on the table by using the CREATE INDEX statement.
This action is similar to creating a clustered index on any table, because SQL Server essentially
drops the table and re-creates it in a clustered index format. If the table already has a
partitioned clustered index applied to it, you can drop the index and rebuilding it on a partition
scheme by using CREATE INDEX with the DROP EXISTING = ON clause
I think this might solve my problem.
It can be automated, but I'm not sure is worth it. If is only 'several' tables, not hundreds, then is better to just script out each table and then build a script that does the copy out/split the copy/switch out/split the source/switch in.
Automating this would involve dynamically building the temp table definition(s), including all indexes, from sys.tables/sys.columns/sys.indexes/sys.index_columns and other similar views. Same way SMO Scripting does it.
Yes, you can switch partitions in a automated process. Here is a code sample you can customise. It is driven from a metadata table.
CREATE TABLE [dbo].[PartitionTableSetup](
[Id] [int] IDENTITY(1,1) NOT NULL,
[TableName] [varchar](256) NULL,
[SwitchTable] [varchar](256) NULL,
[Partition] [int] NULL)
select #merge = (
Select N'' + com + '' from (
Select N' ALTER TABLE '
+ TableName +
' SWITCH PARTITION 2 TO '
+ SwitchTable
+ ' PARTITION 2 Truncate table '
+ SwitchTable as com
,value
,1 as ord
From (
SELECT convert(datetime,value) as value
,pt.TableName
,pt.SwitchTable
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
Join dbo.PartitionTableSetup pt
On pt.[Partition] = pr.ID
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) a
Union all
Select N' ALTER PARTITION FUNCTION '
+ b.PartitionFunction
+ '() MERGE RANGE ('''
+ Convert(nvarchar,value,121)
+''')' as com
,value
,2 as ord
From (
SELECT convert(datetime,value) as value
,pr.PartitionFunction
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) b
) c Order by value
, ord
for xml path ('')
)
EXECUTE (#merge)

How do you check if a certain index exists in a table?

Something like this:
SELECT
*
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS
WHERE CONSTRAINT_NAME ='FK_TreeNodesBinaryAssets_BinaryAssets'
and TABLE_NAME = 'TreeNodesBinaryAssets'
but for indexes.
You can do it using a straight forward select like this:
SELECT *
FROM sys.indexes
WHERE name='YourIndexName' AND object_id = OBJECT_ID('Schema.YourTableName')
For SQL 2008 and newer, a more concise method, coding-wise, to detect index existence is by using the INDEXPROPERTY built-in function:
INDEXPROPERTY ( object_ID , index_or_statistics_name , property )
The simplest usage is with the IndexID property:
If IndexProperty(Object_Id('MyTable'), 'MyIndex', 'IndexID') Is Null
If the index exists, the above will return its ID; if it doesn't, it will return NULL.
AdaTheDEV, I used your syntax and created the following and why.
Problem: Process runs once a quarter taking an hour due to missing index.
Correction: Alter query process or Procedure to check for index and create it if missing... Same code is placed at the end of the query and procedure to remove index since it is not needed but quarterly. Showing Only drop syntax here
-- drop the index
begin
IF EXISTS (SELECT * FROM sys.indexes WHERE name='Index_Name'
AND object_id = OBJECT_ID('[SchmaName].[TableName]'))
begin
DROP INDEX [Index_Name] ON [SchmaName].[TableName];
end
end
If the hidden purpose of your question is to DROP the index before making INSERT to a large table, then this is useful one-liner:
DROP INDEX IF EXISTS [IndexName] ON [dbo].[TableName]
This syntax is available since SQL Server 2016. Documentation for IF EXISTS:
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016/
In case you deal with a primery key instead, then use this:
ALTER TABLE [TableName] DROP CONSTRAINT IF EXISTS [PK_name]
A slight deviation from the original question however may prove useful for future people landing here wanting to DROP and CREATE an index, i.e. in a deployment script.
You can bypass the exists check simply by adding the following to your create statement:
CREATE INDEX IX_IndexName
ON dbo.TableName
WITH (DROP_EXISTING = ON);
Read more here: CREATE INDEX (Transact-SQL) - DROP_EXISTING Clause
N.B. As mentioned in the comments, the index must already exist for this clause to work without throwing an error.
Wrote the below function that allows me to quickly check to see if an index exists; works just like OBJECT_ID.
CREATE FUNCTION INDEX_OBJECT_ID (
#tableName VARCHAR(128),
#indexName VARCHAR(128)
)
RETURNS INT
AS
BEGIN
DECLARE #objectId INT
SELECT #objectId = i.object_id
FROM sys.indexes i
WHERE i.object_id = OBJECT_ID(#tableName)
AND i.name = #indexName
RETURN #objectId
END
GO
EDIT: This just returns the OBJECT_ID of the table, but it will be NULL if the index doesn't exist. I suppose you could set this to return index_id, but that isn't super useful.
-- Delete index if exists
IF EXISTS(SELECT TOP 1 1 FROM sys.indexes indexes INNER JOIN sys.objects
objects ON indexes.object_id = objects.object_id WHERE indexes.name
='Your_Index_Name' AND objects.name = 'Your_Table_Name')
BEGIN
PRINT 'DROP INDEX [Your_Index_Name] ON [dbo].[Your_Table_Name]'
DROP INDEX [our_Index_Name] ON [dbo].[Your_Table_Name]
END
GO
EXEC sp_helpindex '[[[SCHEMA-NAME.TABLE-NAME]]]'
GO
To check Clustered Index exist on particular table or not:
SELECT * FROM SYS.indexes
WHERE index_id = 1 AND name IN (SELECT CONSTRAINT_NAME FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS WHERE TABLE_NAME = 'Table_Name')

Resources