SQL Server index - ideas? - sql-server

I have this query :
SELECT
c.violatorname
FROM
dbo.crimecases AS c,
dbo.people AS p
WHERE
REPLACE(c.violatorname, ' ', '') = CONCAT(CONCAT(CONCAT(p.firstname, p.secondname), p.thirdname), p.lastname);
The query is very slow, I need to create an index on violatorname column with replace function. Any ideas?

I would suggest you to add computed columns and create index on it.
ALTER TABLE crimecases
ADD violatornameProcessed AS Replace(violatorname, ' ', '') PERSISTED
ALTER TABLE people
ADD fullName AS Concat(firstname, secondname, thirdname, lastname) PERSISTED
Persisted will store the computed data on the disk instead of computing every time. Now create index on it.
CREATE INDEX Nix_crimecases_violatornameProcessed
ON crimecases (violatornameProcessed)
include (violatorname)
CREATE INDEX Nix_people_fullName
ON people (fullName)
Query can be written like
SELECT c.violatorname
FROM dbo.crimecases AS c
INNER JOIN dbo.people AS p
ON c.violatornameProcessed = p.fullName

Related

Query tuning required for expensive query

Can someone help me to optimize the code? I have other way to optimize it by using compute column but we can not change the schema on prod as we are not sure how many API's are used to push data into this table. This table has millions of rows and adding a non-clustered index is not helping due to the query cost and it's going for a scan.
create table testcts(
name varchar(100)
)
go
insert into testcts(
name
)
select 'VK.cts.com'
union
select 'GK.ms.com'
go
DECLARE #list varchar(100) = 'VK,GK'
select * from testcts where replace(replace(name,'.cts.com',''),'.ms.com','') in (select value from string_split(#list,','))
drop table testcts
One possibility might be to strip off the .cts.com and .ms.com subdomain/domain endings before you insert or store the name data in your table. Then, use the following query instead:
SELECT *
FROM testcts
WHERE name IN (SELECT value FROM STRING_SPLIT(#list, ','));
Now SQL Server should be able to use an index on the name column.
If your values are always suffixed by cts.com or ms.com you could add that to the search pattern:
SELECT {YourColumns} --Don't use *
FROM dbo.testcts t
JOIN (SELECT CONCAT(SS.[value], V.Suffix) AS [value]
FROM STRING_SPLIT(#list, ',') SS
CROSS APPLY (VALUES ('.cts.com'),
('.ms.com')) V (Suffix) ) L ON t.[name] = L.[value];

SQL Server : select and add to new column of same table

Given that kind of syntax
SELECT 'all the left part of 'email' column before #'
FROM [dbname].[tablename]
How to insert the SELECT query into a new column called 'email_left'?
PS. for simplicity I do not show the long query after my select query.
it is an update:
UPDATE x
SET [new column created outside this statement] = 'all the left part of "email" column before #'
FROM [dbname].[tablename] AS x
another solution - computed column
ALTER TABLE dbo.table_name
ADD column_name AS 'all the left part of ' + email + ' column before #'
GO
update -
IF OBJECT_ID('dbo.email', 'U') IS NOT NULL
DROP TABLE dbo.email
GO
SELECT email
INTO dbo.email
FROM [dbname].[tablename]
GO
--CREATE CLUSTERED INDEX index_name ON dbo.email (email)

Search index in SQL Server ignoring special characters

I have an [nvarchar] column in a SQL Server table containing data like 123456789, 123-456789, 1234.56.789, 1.23456-789 and so on. The users just add dots, minus and spaces somewhere for readability and I don't know where.
Is there a way to create an index which ignores Special characters and find these when searching for plain "123456789"?
No there is no way to do exactly what you want in the way that you want.
The best mechanism for doing this is to use a computed column. It does not need to be persisted to be indexed.
Initial Position
CREATE TABLE YourTable
(
YourColumn NVARCHAR(50)
);
INSERT INTO YourTable
VALUES ('123456789'),
('123-456789'),
('1234.56.789'),
('1.23456-789');
Create computed column and index it.
ALTER TABLE YourTable
ADD CanonicalForm AS
CAST(REPLACE(REPLACE(REPLACE(YourColumn, '.', ''), '-', ''), ' ', '') AS NVARCHAR(50));
CREATE INDEX ix
ON YourTable(CanonicalForm)
INCLUDE (YourColumn);
Test it
SELECT *
FROM YourTable
WHERE CanonicalForm = '123456789'
Execution plan seeks on the index

Searchable text is in 2 tables, how to design full text index?

In a forum application, the actual name of the thread is stored in a table, and then replies is stored in another table.
Table_Thread
Subject varchar(255) e.g. "How to setup fulltext search"
Table_Replies (users replies here)
ReplyText text(not null)
Now I want to create a full-text search on both the subject and reply columns, but they seem very related so they should be in the same index.
Is it possible to do this?
I'm using sql server 2005.
Assuming there is an association between the subject and the replies you could create a view WITH SCHEMABINDING, create a UNIQUE CLUSTERED index on the view and then add that view to your fulltext catalog selecting the two columns you want included.
When huge concurrent query requests come, RDBMS cannot afford it by SQL. what's more, select SQL supports full-text search badly. So you need IR (Information Retrieval) library such as Lucene for java.
You could create a indexed view containing an union of both indexed columns + PK of the tables
e.g.
CREATE VIEW SearchText
WITH SCHEMABINDING
AS SELECT * FROM (
(Subject as Text, Table_Thread_ID as ID, 1 as Type FROM Table_Thread)
UNION ALL
(ReplyText as Text, Table_Replies_ID as ID, 2 as Type FROM Table_Replies));
I put type 1 and 2 as arbitrary, since you need a unique key to build a fulltext index.
And then create a unique index on (ID, Type), and finally your fulltext index.
CREATE UNIQUE INDEX SearchText_UK ON SearchText (ID, Type);
CREATE FULLTEXT CATALOG ft AS DEFAULT;
CREATE FULLTEXT INDEX ON SearchText(Text)
KEY INDEX SearchText_UK
WITH STOPLIST = SYSTEM;
I have seen what NopCommerce (C# MVC Open Source E-Commerce) has done using fulltext search on 'products' and 'variants' and only return 'products'. This is very similar to your case because you want to search on 'Thread' and 'Replies' but you obviously want to only return 'threads'. I have change it to use threads and replies for you:
First, create a function that generates an index name by table (optional):
CREATE FUNCTION [dbo].[nop_getprimarykey_indexname]
(
#table_name nvarchar(1000) = null
)
RETURNS nvarchar(1000)
AS
BEGIN
DECLARE #index_name nvarchar(1000)
SELECT #index_name = i.name
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS i ON (i.index_id > 0 and i.is_hypothetical = 0) AND (i.object_id=tbl.object_id)
WHERE (i.is_unique=1 and i.is_disabled=0) and (tbl.name=#table_name)
RETURN #index_name
END
GO
Then, enable fulltext by creating the catalog and the indexes:
EXEC('
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_catalogs WHERE [name] = ''myFullTextCatalog'')
CREATE FULLTEXT CATALOG [myFullTextCatalog] AS DEFAULT')
DECLARE #create_index_text nvarchar(4000)
SET #create_index_text = '
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_indexes WHERE object_id = object_id(''[Table_Thread]''))
CREATE FULLTEXT INDEX ON [Table_Thread]([Subject])
KEY INDEX [' + dbo.[nop_getprimarykey_indexname] ('Table_Thread') + '] ON [myFullTextCatalog] WITH CHANGE_TRACKING AUTO'
EXEC(#create_index_text)
SET #create_index_text = '
IF NOT EXISTS (SELECT 1 FROM sys.fulltext_indexes WHERE object_id = object_id(''[Table_Replies]''))
CREATE FULLTEXT INDEX ON [Table_Replies]([ReplyText])
KEY INDEX [' + dbo.[nop_getprimarykey_indexname] ('Table_Replies') + '] ON [myFullTextCatalog] WITH CHANGE_TRACKING AUTO'
EXEC(#create_index_text)
Then, in the stored procedure to obtain products by keywords, build a temporary table with a list of product Ids that match the keywords.
INSERT INTO #KeywordThreads ([ThreadId])
SELECT t.Id
FROM Table_Thread t with (NOLOCK)
WHERE CONTAINS(t.[Subject], #Keywords)
UNION
SELECT r.ThreadId
FROM Table_Replies r with (NOLOCK)
WHERE CONTAINS(pv.[ReplyText], #Keywords)
Now you can use the temporary table #KeywordThreads to join with the list of threads and return them.
I hope this helps.

SQL Server Scripting Partitioning

Had a good look on the net and books online and couldn't find an answer to my question, so here goes.
Working on someone else's design, I have several tables all tied to the same partition schema and partition function. I wish to perform a split operation which would affect many hundreds of millions of rows.
To split is no problem:
ALTER PARTITION SCHEME [ps_Scheme] NEXT USED [FG1] ;
ALTER PARTITION FUNCTION [pfcn_Function]() SPLIT RANGE (20120331)
However, I'm concerned that this will affect many tables at once and is not desirable.
Therefore, I was going to create a new copy of the table and do the split on a new function
CREATE PARTITION FUNCTION [pfcn_Function1](INT)
AS RANGE RIGHT
FOR VALUES
(
20090101, 20090130, 20090131, 20090201...etc
)
CREATE PARTITION SCHEME [ps_Scheme1]
AS PARTITION [pfcn_Function1] TO
([FG1], [FG2] etc
CREATE TABLE [dbo].[myTableCopy]
(
....
) ON ps_Scheme1
Then I would switch the partition I wish to split across:
-- The partition numbers did not align because they are based on 2 different functions.
ALTER TABLE [Table] SWITCH PARTITION 173 TO [TableCopy] PARTITION 172
Finally my question is can this be automated? You can make a copy of the table easily in SQL using SELECT INTO, but I cannot see how to automate the partitioning of the table i.e. the bit on the end of the CREATE TABLE statement that points to the partition scheme.
Thanks for any responses.
Found this on books online:
You can turn an existing nonpartitioned table into a partitioned table in one of two ways.
One way is to create a partitioned clustered index on the table by using the CREATE INDEX statement.
This action is similar to creating a clustered index on any table, because SQL Server essentially
drops the table and re-creates it in a clustered index format. If the table already has a
partitioned clustered index applied to it, you can drop the index and rebuilding it on a partition
scheme by using CREATE INDEX with the DROP EXISTING = ON clause
I think this might solve my problem.
It can be automated, but I'm not sure is worth it. If is only 'several' tables, not hundreds, then is better to just script out each table and then build a script that does the copy out/split the copy/switch out/split the source/switch in.
Automating this would involve dynamically building the temp table definition(s), including all indexes, from sys.tables/sys.columns/sys.indexes/sys.index_columns and other similar views. Same way SMO Scripting does it.
Yes, you can switch partitions in a automated process. Here is a code sample you can customise. It is driven from a metadata table.
CREATE TABLE [dbo].[PartitionTableSetup](
[Id] [int] IDENTITY(1,1) NOT NULL,
[TableName] [varchar](256) NULL,
[SwitchTable] [varchar](256) NULL,
[Partition] [int] NULL)
select #merge = (
Select N'' + com + '' from (
Select N' ALTER TABLE '
+ TableName +
' SWITCH PARTITION 2 TO '
+ SwitchTable
+ ' PARTITION 2 Truncate table '
+ SwitchTable as com
,value
,1 as ord
From (
SELECT convert(datetime,value) as value
,pt.TableName
,pt.SwitchTable
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
Join dbo.PartitionTableSetup pt
On pt.[Partition] = pr.ID
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) a
Union all
Select N' ALTER PARTITION FUNCTION '
+ b.PartitionFunction
+ '() MERGE RANGE ('''
+ Convert(nvarchar,value,121)
+''')' as com
,value
,2 as ord
From (
SELECT convert(datetime,value) as value
,pr.PartitionFunction
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) b
) c Order by value
, ord
for xml path ('')
)
EXECUTE (#merge)

Resources