Perform exact search honouring spaces - sql-server

I'm using SQL Server 2017 and my collation is SQL_LATIN1_GENERAL_CP1_CI_AS and ANSI_PADDING is default value (ON).
In my table, one of the columns is of type NVARCHAR(255) and one of the values is inserted like this (including space):
N'abc '
And when I search it without space (N'abc'), I don't want to get N'abc ', but it finds it.
I know I can remove spaces during inserting record, but can't change already inserted records.
How can I prevent to find it with querying like this?
CREATE TABLE #tmp (c1 nvarchar(255))
INSERT INTO #tmp
VALUES (N'abc ')
SELECT *
FROM #tmp
WHERE c1 = N'abc'
DROP TABLE #tmp
I also found this article but want to prevent while when I querying it.
Why the SQL Server ignore the empty space at the end automatically?
I'm using Linq-to-entities with C#, and with SQL query, I can search with 'LIKE' keyword without percent character
SELECT *
FROM #tmp
WHERE c1 LIKE N'abc'
But with Linq, I don't know how to write this query:
entity.Temp.Where(p => p.c1 == "abc");
entity.Temp.Where(p => p.c1.Equals("abc"));
entity.Temp.Where(p => p.c1.Contains("abc"));

You can try:
SELECT * FROM #tmp WHERE cast(c1 as varbinary(510)) = cast(N'abc' as varbinary(510))
This would be very slow if you have a lot of rows, but it works.

Related

Can I do a bulk insert into a table from Microsoft SQL Server Management Studio with a copy and paste list?

I often get a list of names I need to update in a table from an Excel list, and I end up creating a SSIS program to reads the file into a staging table and doing it that way. But is there I way I could just copy and past the names into a table from Management Studio directly? Something like this:
create table #temp (personID int, userName varchar(15))
Insert
Into #temp (userName)
values (
'kmcenti1',
'ladams5',
'madams3',
'haguir1',
)
Obviously this doesn't work but I've tried different variations and nothing seems to work.
Here's an option with less string manipulation. Just paste your values between the single quotes
Declare #List varchar(max) = '
kmcenti1
ladams5
madams3
haguir1
'
Insert into #Temp (userName)
Select username=value
From string_split(replace(#List,char(10),''),char(13))
Where Value <>''
For Multiple Columns
Source:
-- This is a copy/paste from Excel --
-- This includes Headers which is optional --
-- There is a TAB between cells --
Declare #List nvarchar(max) = '
Name Age email
kmcenti1 25 kmcenti1#gmail.com
ladams5 32 ladams5#gmail.com
madams3 18 madams3#gmail.com
haguir1 36 haguir1#gmail.com
'
Select Pos1 = JSON_VALUE(JS,'$[0]')
,Pos2 = JSON_VALUE(JS,'$[1]') -- could try_convert(int)
,Pos3 = JSON_VALUE(JS,'$[2]')
From string_split(replace(replace(#List,char(10),''),char(9),'||'),char(13)) A
Cross Apply (values ('["'+replace(string_escape(Value,'json'),'||','","')+'"]') ) B(JS)
Where Value <>''
and nullif(JSON_VALUE(JS,'$[0]'),'')<>'Name'
Results
Is this along the lines you're looking for?
create table #temp (personID int identity(1,1), userName varchar(15))
insert into #temp (userName)
select n from (values
('kmcenti1'),
('ladams5'),
('madams3'),
('haguir1'))x(n);
This assumes you want the ID generated for you since it's not in your data.
That SQL statement you have won't work (That's one row). But I have a work around. Build what you need with a formula in Excel.
Assuming user IDs are in column A:
In Cell B2, insert this formula:
="('"&A1&"'),"
And then drag the formula down you list.
Go to SSMS and type in:
insert into [your table](userName) values
And then paste in column B from Excel and delete the last comma.

Query tuning required for expensive query

Can someone help me to optimize the code? I have other way to optimize it by using compute column but we can not change the schema on prod as we are not sure how many API's are used to push data into this table. This table has millions of rows and adding a non-clustered index is not helping due to the query cost and it's going for a scan.
create table testcts(
name varchar(100)
)
go
insert into testcts(
name
)
select 'VK.cts.com'
union
select 'GK.ms.com'
go
DECLARE #list varchar(100) = 'VK,GK'
select * from testcts where replace(replace(name,'.cts.com',''),'.ms.com','') in (select value from string_split(#list,','))
drop table testcts
One possibility might be to strip off the .cts.com and .ms.com subdomain/domain endings before you insert or store the name data in your table. Then, use the following query instead:
SELECT *
FROM testcts
WHERE name IN (SELECT value FROM STRING_SPLIT(#list, ','));
Now SQL Server should be able to use an index on the name column.
If your values are always suffixed by cts.com or ms.com you could add that to the search pattern:
SELECT {YourColumns} --Don't use *
FROM dbo.testcts t
JOIN (SELECT CONCAT(SS.[value], V.Suffix) AS [value]
FROM STRING_SPLIT(#list, ',') SS
CROSS APPLY (VALUES ('.cts.com'),
('.ms.com')) V (Suffix) ) L ON t.[name] = L.[value];

Executing dynamic SQL with return value as a column value for each rows

I have a rather simple query that I started to modify in order to remove temp table as we have concurrency issues over many different systems and clients.
Right now the simple solution was to break up the query in multiple separate queries to replicate what SQL was doing before.
I am trying to figure out a way to return the result of a dynamic SQL query as a column value. The new query is quite simple, it look in the system objects for all table with specific format and output. What i am missing is that for each record i need to output the result of a dynamic query on each of those table.
The query :
SELECT [name] as 'TableName'
FROM SYSOBJECTS WHERE xtype = 'U'
AND (CHARINDEX('_PCT', [name]) <> 0
OR CHARINDEX('_WHT', [name]) <> 0)
All these table have a common column called Result which is a float. What i am trying to do is return the count of this column under some WHERE clause that is generic and will work will all tables as well.
A desired query (i know it's not valid) would be :
SELECT [name] as 'TableName',
sp_executesql 'SELECT COUNT(*) FROM ' + [name] + ' WHERE Result > 0 OR (Result < 139 AND CurrentIndex < 15)' as 'ResultValue'
FROM SYSOBJECTS WHERE xtype = 'U'
AND (CHARINDEX('_PCT', [name]) <> 0
OR CHARINDEX('_WHT', [name]) <> 0)
Before it used to be easy. We had a temp table with 2 columns and were filling the table name first. Then we iterate on the temp table and execute the dynamic sql and return the value in an OUTPUT variable and simply update the record of the temp table and finally return the table.
I have tried a scalar function but it doesn't support dynamic SQL so it doesn't work. I would rather not create the 13,000~ different queries for the 13,000~ tables.
I have tried using a reference table and use trigger to update the status but it slow the system way to much. The average tables insert and delete 28 millions records. The original temp table query only took 5-6 minutes to execute due to very good indexing and now we are reaching 25-30 minutes.
Is there any other solution available than Querying the table list then the Client query each table one by one to know it status ?
We are using SQL Server 2017 if some new features are available now
You can use this script for your purpose (tested in SQL Server 2016).
Updated: It should work now as the results are a single set now.
EXEC sp_msforeachtable
#precommand = 'CREATE TABLE ##Statistics
(TableName varchar(128) NOT NULL,
NumOfRows int)',
#command1 ='INSERT INTO ##Statistics (TableName, NumOfRows)
SELECT ''?'' Table_Name, COUNT(*) Row_Count FROM ? WHERE Result > 0 OR (Result < 139 AND CurrentIndex < 15)',
#postcommand = 'SELECT TableName, NumOfRows FROM ##Statistics;
DROP TABLE ##Statistics'
,#whereand = ' And Object_id In (Select Object_id From sys.objects
Where name like ''%_PCT%'' OR name like ''%_WHT%'')'
For more details on sp_msforeachtable Please visit this link

Splitting multiple fields by delimiter

I have to write an SP that can perform Partial Updates on our databases, the changes are stored in a record of the PU table. A values fields contains all values, delimited by a fixed delimiter. A tables field refers to a Schemes table containing the column names for each table in a similar fashion in a Colums fiels.
Now for my SP I need to split the Values field and Columns field in a temp table with Column/Value pairs, this happens for each record in the PU table.
An example:
Our PU table looks something like this:
CREATE TABLE [dbo].[PU](
[Table] [nvarchar](50) NOT NULL,
[Values] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','John Doe;26');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Jane Doe;22');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mike Johnson;20');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Person','Mary Jane;24');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Mathematics');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','English');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Course','Geography');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus A;Schools Road 1;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus B;Schools Road 31;Educationville');
INSERT INTO [dbo].[PU]([Table],[Values]) VALUES ('Campus','Campus C;Schools Road 22;Educationville');
And we have a Schemes table similar to this:
CREATE TABLE [dbo].[Schemes](
[Table] [nvarchar](50) NOT NULL,
[Columns] [nvarchar](max) NOT NULL
)
Insert SQL for this example:
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Person','[Name];[Age]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Course','[Name]');
INSERT INTO [dbo].[Schemes]([Table],[Columns]) VALUES ('Campus','[Name];[Address];[City]');
As a result the first record of the PU table should result in a temp table like:
The 5th will have:
Finally, the 8th PU record should result in:
You get the idea.
I tried use the following query to create the temp tables, but alas it fails when there's more that one value in the PU record:
DECLARE #Fields TABLE
(
[Column] INT,
[Value] VARCHAR(MAX)
)
INSERT INTO #Fields
SELECT TOP 1
(SELECT Value FROM STRING_SPLIT([dbo].[Schemes].[Columns], ';')),
(SELECT Value FROM STRING_SPLIT([dbo].[PU].[Values], ';'))
FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
TOP 1 correctly gets the first PU record as each PU record is removed once processed.
The error is:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
In the case of a Person record, the splits are indeed returning 2 values/colums at a time, I just want to store the values in 2 records instead of getting an error.
Any help on rewriting the above query?
Also do note that the data is just generic nonsense. Being able to have 2 fields that both have delimited values, always equal in amount (e.g. a 'person' in the PU table will always have 2 delimited values in the field), and break them up in several column/header rows is the point of the question.
UPDATE: Working implementation
Based on the (accepted) answer of Sean Lange, I was able to work out followin implementation to overcome the issue:
As I need to reuse it, the combine column/value functionality is performed by a new function, declared as such:
CREATE FUNCTION [dbo].[JoinDelimitedColumnValue]
(#splitValues VARCHAR(8000), #splitColumns VARCHAR(8000),#pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH MyValues AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnValue = x.Item
FROM dbo.DelimitedSplit8K(#splitValues, #pDelimiter) x
)
, ColumnData AS
(
SELECT ColumnPosition = x.ItemNumber,
ColumnName = x.Item
FROM dbo.DelimitedSplit8K(#splitColumns, #pDelimiter) x
)
SELECT cd.ColumnName,
v.ColumnValue
FROM MyValues v
JOIN ColumnData cd ON cd.ColumnPosition = v.ColumnPosition
;
In case of the above sample data, I'd call this function with the following SQL:
DECLARE #FieldValues VARCHAR(8000), #FieldColumns VARCHAR(8000)
SELECT TOP 1 #FieldValues=[dbo].[PU].[Values], #FieldColumns=[dbo].[Schemes].[Columns] FROM [dbo].[PU] INNER JOIN [dbo].[Schemes] ON [dbo].[PU].[Table] = [dbo].[Schemes].[Table]
INSERT INTO #Fields
SELECT [Column] = x.[ColumnName],[Value] = x.[ColumnValue] FROM [dbo].[JoinDelimitedColumnValue](#FieldValues, #FieldColumns, #Delimiter) x
This data structure makes this way more complicated than it should be. You can leverage the splitter from Jeff Moden here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The main difference of that splitter and all the others is that his returns the ordinal position of each element. Why all the other splitters don't do this is beyond me. For things like this it is needed. You have two sets of delimited data and you must ensure that they are both reassembled in the correct order.
The biggest issue I see is that you don't have anything in your main table to function as an anchor for ordering the results correctly. You need something, even an identity to ensure the output rows stay "together". To accomplish I just added an identity to the PU table.
alter table PU add RowOrder int identity not null
Now that we have an anchor this is still a little cumbersome for what should be a simple query but it is achievable.
Something like this will now work.
with MyValues as
(
select p.[Table]
, ColumnPosition = x.ItemNumber
, ColumnValue = x.Item
, RowOrder
from PU p
cross apply dbo.DelimitedSplit8K(p.[Values], ';') x
)
, ColumnData as
(
select ColumnName = replace(replace(x.Item, ']', ''), '[', '')
, ColumnPosition = x.ItemNumber
, s.[Table]
from Schemes s
cross apply dbo.DelimitedSplit8K(s.Columns, ';') x
)
select cd.[Table]
, v.ColumnValue
, cd.ColumnName
from MyValues v
join ColumnData cd on cd.[Table] = v.[Table]
and cd.ColumnPosition = v.ColumnPosition
order by v.RowOrder
, v.ColumnPosition
I recommended not storing values like this in the first place. I recommend having a key value in the tables and preferably not using Table and Columns as a composite key. I recommend to avoid using reserved words. I also don't know what version of SQL you are using. I am going to assume you are using a fairly recent version of Microsoft SQL Server that will support my provided stored procedure.
Here is an overview of the solution:
1) You need to convert both the PU and the Schema table into a table where you will have each "column" value in the list of columns isolated in their own row. If you can store the data in this format rather than the provided format, you will be a little better off.
What I mean is
Table|Columns
Person|Jane Doe;22
needs converted to
Table|Column|OrderInList
Person|Jane Doe|1
Person|22|2
There are multiple ways to do this, but I prefer an xml trick that I picked up. You can find multiple split string examples online so I will not focus on that. Use whatever gives you the best performance. Unfortunately, You might not be able to get away from this table-valued function.
Update:
Thanks to Shnugo's performance enhancement comment, I have updated my xml splitter to give you the row number which reduces some of my code. I do the exact same thing to the Schema list.
2) Since the new Schema table and the new PU table now have the order each column appears, the PU table and the schema table can be joined on the "Table" and the OrderInList
CREATE FUNCTION [dbo].[fnSplitStrings_XML]
(
#List NVARCHAR(MAX),
#Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT y.i.value('(./text())[1]', 'nvarchar(4000)') AS Item,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) as RowNumber
FROM
(
SELECT CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.') AS x
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
CREATE Procedure uspGetColumnValues
as
Begin
--Split each value in PU
select p.[Table],p.[Values],a.[Item],CHARINDEX(a.Item,p.[Values]) as LocationInStringForSorting,a.RowNumber
into #PuWithOrder
from PU p
cross apply [fnSplitStrings_XML](p.[Values],';') a --use whatever string split function is working best for you (performance wise)
--Split each value in Schema
select s.[Table],s.[Columns],a.[Item],CHARINDEX(a.Item,s.[Columns]) as LocationInStringForSorting,a.RowNumber
into #SchemaWithOrder
from Schemes s
cross apply [fnSplitStrings_XML](s.[Columns],';') a --use whatever string split function is working best for you (performance wise)
DECLARE #Fields TABLE --If this is an ETL process, maybe make this a permanent table with an auto incrementing Id and reference this table in all steps after this.
(
[Table] NVARCHAR(50),
[Columns] NVARCHAR(MAX),
[Column] VARCHAR(MAX),
[Value] VARCHAR(MAX),
OrderInList int
)
INSERT INTO #Fields([Table],[Columns],[Column],[Value],OrderInList)
Select pu.[Table],pu.[Values] as [Columns],s.Item as [Column],pu.Item as [Value],pu.RowNumber
from #PuWithOrder pu
join #SchemaWithOrder s on pu.[Table]=s.[Table] and pu.RowNumber=s.RowNumber
Select [Table],[Columns],[Column],[Value],OrderInList
from #Fields
order by [Table],[Columns],OrderInList
END
GO
EXEC uspGetColumnValues
GO
Update:
Since your working implementation is a table-valued function, I have another recommendation. The problem I see is that your using a table valued function which ultimately handles one record at a time. You are going to have better performance with set based operations and batching as needed. With a tabled valued function, you are likely going to be looping through each row. If this is some sort of ETL process, your team will be better off if you have a stored procedure that processes the rows in bulk. It might make sense to stage the results into a better table that your team can work with down stream rather than have them use a potentially slow table-valued function.

Search index in SQL Server ignoring special characters

I have an [nvarchar] column in a SQL Server table containing data like 123456789, 123-456789, 1234.56.789, 1.23456-789 and so on. The users just add dots, minus and spaces somewhere for readability and I don't know where.
Is there a way to create an index which ignores Special characters and find these when searching for plain "123456789"?
No there is no way to do exactly what you want in the way that you want.
The best mechanism for doing this is to use a computed column. It does not need to be persisted to be indexed.
Initial Position
CREATE TABLE YourTable
(
YourColumn NVARCHAR(50)
);
INSERT INTO YourTable
VALUES ('123456789'),
('123-456789'),
('1234.56.789'),
('1.23456-789');
Create computed column and index it.
ALTER TABLE YourTable
ADD CanonicalForm AS
CAST(REPLACE(REPLACE(REPLACE(YourColumn, '.', ''), '-', ''), ' ', '') AS NVARCHAR(50));
CREATE INDEX ix
ON YourTable(CanonicalForm)
INCLUDE (YourColumn);
Test it
SELECT *
FROM YourTable
WHERE CanonicalForm = '123456789'
Execution plan seeks on the index

Resources