How to optimize replace and find function? - sql-server

I am trying to create function which can replace certain words with hyperlink in sql. When I call the function as query in sql, its takes a really long time to execute the query, more than 2-3min. I assumed this is because, the tag_libary table has around 600,000 records and iterating through large number, would consume a lot of processing time.
CREATE FUNCTION dbo.ReplaceTags(#body VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
SELECT #body = REPLACE(#body,name,''+name+'')
FROM Tag_Library
RETURN #body
END
article table (id, title, body)
1, Story1, At the same time there is a list consisting of: DUCHS, EUROC, GLSPE and WODST. Only two of the tags have covered with the prices in the last three months - GROSV at 99.11 on 8 October and JUBIL at 0s on 11 September.
tag_library table (id, name)
1,DRYDN33
2,DUCHS
3,DRYDN33
4,DRYDN15
5,EUROC
6,DRYDN15
7,GROSV
Hence, I am writing to seek some advice, if there is a way to make this sql function optimal or would it be better to change this function, into a insert trigger?
Please advice, if possible.

Yust a thought, I did not test it:
Change your query to this one:
SELECT
#body = REPLACE(#body,name,''+name+'')
FROM
Tag_Library
WHERE
#body LIKE '%' + name + '%'
This should filter the Tag_Library table to those records which are present in the input string and the SQL Server do not have to process lots of unnecessary records (replaces). BUT It will not prevent to do a full table / index scan to check the table!
You can improve this solution by storing the required tags in a table per articles (and update that table via triggers when the source records/tables are changed). In this case you can use joins to filter the Tag_Library table (instead of the LIKE operator), but it reqires extra codes to maintain the dictionary.

You're focusing on the wrong thing. The problem is that this is a scalar function, and they perform miserably. You should change it to a table-valued function that returns a single row and use APPLY.
See, for example:
http://dataeducation.com/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post/

Related

Trying to speed up a SQL query

I'm still in the process of getting to fully understand SQL Server. I have wrote a stored procedure as shown below:
ALTER PROC [dbo].[Specific_Street_Lookup]
#STR Varchar(50),
#CNT int
AS
BEGIN
SELECT DISTINCT TOP (#CNT)
street_desc, street_localitydesc, postcode_selected
FROM
Full_Streets
INNER JOIN
Postcodes ON Full_Streets.street_postcodeid = postcodes.postcode_id
WHERE
street_desc LIKE #STR+'%'
AND postcode_selected = 'TRUE'
ORDER BY
street_desc, street_localitydesc
END
but it can take up to 7 seconds to return a result, I'm not sure what I can do to speed up the query.
The full_street table has a row count of 856800
The postcode table has a row count of 856208
Both tables have a primary key (street_id & postcode_id)
The purpose of the query: in my VB.net app as the user is typing in a street to look up it return a number of records (#CNT) that match the partial string (LIKE #STR'+%') and only if postcode_selected = 'TRUE'
I'm sure there must be a quicker / better way to do this and any help would be appreciated.
Thanks
Can you try with this index?
CREATE INDEX NCI_street_desc ON Full_Streets(street_desc) INCLUDE(street_localitydesc)
The LIKE operator is evil in such a big table, and I don't think you can optimize this query with normal indexes.
Consider using Full Text Search functionalities. With full text search you can't search portions of strings (unless you make a special table where you pre-save all the possible portions of your strings) but performance are hugely superior than what you can achieve using the LIKE operator.
I would change postcode_selected column type to bit (TRUE = 1, FALSE = 0) and then modify sp accordingly - it will reduce time complexity of the query.

Optimize SQL UPDATE statement/SqlCommand

I have the following statement:
UPDATE Table SET Column=Value WHERE TableID IN ({0})
I have a comma delimited list of TableIDs that can be pretty lengthy(for replacing {0}). I've found that this is faster than using a SqlDataAdapter, however I also noticed that if the command text is too long, the SqlCommand might perform poorly.
Any ideas?
This is inside of a CLR trigger. Each SqlCommand execution incurs some sort of overhead. I've determined that the above command is better than SqlDataAdapter.Update() because Update() will update individual records incurring several SQL statements to be executed.
...I ended up doing the following(trigger time went from .7 to .25 seconds)
UPDATE T SET Column=Value FROM Table T INNER JOIN INSERTED AS I ON (I.TableID=T.TableID)
When there is a long list, the execution plan is probably using an index scan instead of an index seek. In this case, you are probably better off limiting the list to several items, but call the update command repeatedly until all items in the list are accommodated.
Split your list of IDs into batchs maybe. I assume you have the list of id numbers in a collection and you're building up the {0} string. So maybe update 20 or 100 at a time.
Wrap it in a transaction and perform all the updates before calling Commit()
If this is a stored procedue I would use a Table-Valued Parameter. If this is an ad hoc batch then consider populating a temporary table and joining to it in your batch. Your IN-clause is rationalized as a bunch of ORs which can quite easily negate the use of an index. With a JOIN you may get a better plan from the optimizer.
DECLARE #Value VARCHAR(100) = 'Some value';
CREATE TABLE #Table (TableID INT PRIMARY KEY);
INSERT INTO #Table VALUES (1),(2),(3),(n)...;
MERGE INTO Schema.Table AS target
USING #Table AS source
ON target.TableID = source.TableID
WHEN MATCHED THEN UPDATE SET Column = Value;
If you can use a stored procedure, you could use a MERGE statement instead.
MERGE INTO Table AS target
USING #TableIDList AS source
ON target.TableID = source.ID
WHEN MATCHED THEN UPDATE SET Column = source.Value
where #TableIDList is table type sent from code as a table-valued parameter with the IDs (and possibly Values) you need.

SQL SELECT using XML input

I've currently got a C# application that responds to HTTP requests. The body of the HTTP request (XML) is passed to SQL Server, at which time the database engine performs the correct instruction. One of the instructions is used to load information about Invoices using the id of the customer(InvoiceLoad):
<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>
I need to perform a SELECT operation against the invoice table (which contains the associated email address).
I've tried using:
SELECT 'Date', 'Status', 'Location'
FROM Invoices
WHERE Email_Address = Invoice.A.value(.)
using an xml.nodes('InvoiceLoad/Invoice/CustomerId') Invoice(A)
command.
However, as this query may run THOUSANDS of times per minute, I want to make it as fast as possible. I'm hearing that one way to do this may be to use CROSS APPLY (which I have never used). Is that the solution? If not, how exactly would I go about making this query as fast as possible? Any and all suggestions are greatly appreciated!
I don't see why you would need a call to .nodes() at all - from what I understand, each XML fragment has just a single entry - right?
So given this XML:
<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>
you can use this SQL to get the value of the <CustomerID> node:
DECLARE #xmlvar XML
SET #xmlvar = '<InvoiceLoad ControlNumber="12345678901">
<Invoice>
<CustomerID>johndoe#gmail.com</CustomerID>
</Invoice>
</InvoiceLoad>'
SELECT
#xmlvar.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(100)')
and you can join this against your customer table or whatever you need to do.
If you have the XML stored in a table, and you always need to extract that value from <CustomerID>, you could also think about creating a computed, persisted column on that table that would extract that e-mail address into a separate column, which you could then use for easy joining. This requires a little bit of work - a stored function taking the XML as input - but it's really quite a nice way to "surface" certain important snippets of data from your XML.
Step 1: create your function
CREATE FUNCTION dbo.ExtractCustomer (#input XML)
RETURNS VARCHAR(255)
WITH SCHEMABINDING
AS BEGIN
DECLARE #Result VARCHAR(255)
SELECT
#Result = #Input.value('(/InvoiceLoad/Invoice/CustomerID)[1]', 'varchar(255)')
RETURN #result
END
So given your XML, you get the one <CustomerID> node and extract its "inner text" and return it as a VARCHAR(255).
Step 2: add a computed, persisted column to your table
ALTER TABLE dbo.YourTableWithTheXML
ADD CustomerID AS dbo.ExtractCustomer(YourXmlColumnHere) PERSISTED
Now, your table that has the XML column has a new column - CustomerID - which will automagically contain the contents of the <CustomerID> as a VARCHAR(255). The value is persisted, i.e. as long as the XML doesn't change, it doesn't have to be re-computed. You can use that column like any other on your table, and you can even index it to speed up any joins on it!

How to execute a long dynamic query (greater than 4000) characters - again

Note: I'm running under SQL Server 2008 R2...
I've taken the time to read dozens of posts on this site and other sites on how to execute dynamic SQL where the query is more than 4000 characters. I've tried more than a dozen solutions proposed. The consensus seems to be to split the query into 4000-character variables and then do:
EXEC (#SQLQuery1 + #SQLQuery2)
This doesn't work for me - the query is truncated at the end of #SQLQuery1.
Now, I've seen samples how people "force" a long query by using REPLICATE a bunch of spaces, etc., but this is a real query - but it gets a little more sophisticated than that.
I have SQL View with a name of "Company_A_ItemView".
I have 10 companies that I want to create the same exact view, with different names, e.g.
"Company_B_ItemView"
"Company_C_ItemView"
..etc.
If you offer help, please don't ask why there are multiple views - just accept that I need to do it this way, OK?
Each company has its own set of tables, and the CREATE VIEW statement references several tables by name. Here's BRIEF sample, but remember, the total length of the query is around 6000 characters:
CREATE view [dbo].[Company_A_ItemView] as
select
WE.[Item No_],
WE.[Location Code],
LOC.[Bin Number],
[..more fields, etc.]
from
[Company_A_Warehouse_Entry] WE
left join
[Company_A_Location] LOC
...you get the idea
So, what I am currently doing is:
a. Pulling the contents of the CREATE VIEW statement into 2 Declared Variables, e.g.
Set #SQLQuery1 = (select text
from syscomments
where ID = 1382894081 and colid = 1)
Set #SQLQuery2 = (select
from syscomments
where ID = 1382894081 and colid = 2)
Note that this is how SQL stores long definitions - when you create the view, it stores the text into multiple syscomments records. In my case, the view is split into a text chunk of 3591 characters into the first syscomment record and the rest of the text is in the second record. I have no idea why SQL doesn't use all 4000 characters in the syscomment field. And the statement is broken in the middle of a word.
Please note in all my examples, all #SQLQueryxxx variables are declared as varchar(max). I've also tried declaring them as nvarchar(max) and varchar(8000) and nvarchar(8000) with the same results.
b. I then do a "Search and Replace" for "Company_A" and replace it with "Company_B". In the code below, the variable "#CompanyID" is first set to "Company_B":
SET #SQLQueryNew1 = #SQLQuery1
SET #SQLQueryNew1 = REPLACE(#SQLQueryNew1, 'Company_A', #CompanyID)
SET #SQLQueryNew2 = #SQLQuery2
SET #SQLQueryNew2 = REPLACE(#SQLQueryNew2, 'Company_A',#CompanyID)
c. I then try:
EXEC (#SQLQueryNew1 + #SQLQueryNew2)
The message returned indicates that it's trying to execute the statement truncated at the end of #SQLQueryNew1, e.g. 80% (approx) of the query's text.
I've tried CAST'ing the final result into a new varchar(max) and nvarchar(max) - no luck
I've tried CAST'ing the original query a new varchar(max) and nvarchar(max)- no luck
I've looked at the result of retrieving the original CREATE VIEW statement, and it's fine.
I've tried various other ways of retrieving the original CREATE VIEW statement, such as:
Set #SQLQuery1 = (select VIEW_DEFINITION)
FROM [MY_DATABASE].[INFORMATION_SCHEMA].[VIEWS]
where TABLE_NAME = 'Company_A_ItemView')`
This one returns only the first 4000 characters of the CREATE VIEW
Set #SQLQuery1 = (SELECT (OBJECT_DEFINITION(#ObjectID))
If I do a
SELECT LEN(OBJECT_DEFINITION(#ObjectID))
it returns the correct length of the query (e.g. 5191), but if I look at #SQLQuery1, or try to
EXEC(#SQLQuery1), the statement is still truncated.
c. There are some references that state that since I'm manipulating the text of the query after retrieving it, the resulting variables are then truncated to 4000 characters. I've tried CAST'ing the result as I do the REPLACE, e.g.
SET #SQLQueryNew1 = SELECT (CAST(REPLACE(#SQLQueryNew1,
'Company_A',
#CompanyID) AS varchar(max))
Same result.
I know there are other methods, such as creating stored procedures for creating the views. But the views are being developed and are somewhat "in flux", so placing the text of the CREATE VIEW inside a stored proc is cumbersome. My goal is to be able to take Company_A's view and replicate it exactly - multiple times, except reference Company_B's view name and table names, Company_C's view name and table names, etc.
I'm wondering if there is anyone out there who has done this type of manipulation of a long SQL "CREATE VIEW" statement and try to execute it.
Just use VARCHAR(MAX) or NVARCHAR(MAX). They work fine for EXEC(string).
FYI,
Note that this is how SQL stores long definitions - when you create
the view, it stores the text into multiple syscomments records.
This is not correct. This is how it used to be done on SQL Server 2000. Since SQL Server 2005 and higher they are saved as NVARCHAR(MAX) in a single entry in sys.sql_modules.
syscomments is still around, but it is retained read-only solely for compatibility.
So all you should need to do is to change your #SQLQuery1,2,etc. variables to a single NVARCHAR(MAX) variable, and pull your View code from the [definition] column of the sys.sql_modules table instead.
Note that you should be careful with your string manipulations as there are certain functions that will revert to (N)VARCHAR(4000) output if all of their input arguments are not (N)VARCHAR(MAX). (Sorry, I do not know which ones, but REPLACE() may be one). In fact, this may be what has been causing so much confusion in your tests.
declare your sql variables (#SQLQuery1...) as nvarchar(4000)
be sure each sql part did't exceed 4000 byte (copy each part to a text file and test the file size in bytes)

SQL Server 2000: search through out database

Some how some records in my table are getting updated with value of xyz in a certain column. Out of hundred of stored procedures, functions, triggers, how can I determine which code is doing this action. Is there a way to search through the database some how through each and every script of the code?
Please help.
One approach is to check syscomments
Contains entries for each view, rule,
default, trigger, CHECK constraint,
DEFAULT constraint, and stored
procedure within the database. The
text column contains the original SQL
definition statements..
e.g. select text from syscomments
If you are having trouble finding that literal string, the values could be coming from a table, or they could be being concatenated within a routine.
Try this
Select text from syscomments
where CharIndex('x', text) > 0
and CharIndex('y', text) > 0
and CharIndex('z', text) > 0
That might help you either find the right routine, or further indicate that the values are coming from a table.
This is going to be nearly impossible to do in SQL Server 2000 because the update might very well be from a variable that has that value or a join to another table that has that value and not hard-coded into the stored proc, trigger etc. The update could also be coming from a DTS package, a job, a piece of dynamic code run by the app or even from query analyzer, so the code itself may not be recorded inthe datbase anywhere.
Perhaps a better approach might be to create an audit table for the table in question and have it record the user and the code from the spid that generated the change as well as the old and new values. You'll have to wait until it happens again, but then you would know exactly what changed the value and what value to put it back to if need be.
Alternatively you could run profiler on the system until it happens but profiler tends to hurt performance and is not usually a good idea to run on a production system. If it is happening very often, it might be an acceptable alternative.
Here's a hint as to how you might get some of the info you want for the eventual trigger code you write:
create table #temp (eventtype nvarchar (1000), parameters int, eventinfo nvarchar (4000), myspid int)
declare #myspid int
select #myspid =##spid
insert #temp (eventtype,parameters, eventinfo)
exec ('dbcc inputbuffer (##spid)')
update #temp
set myspid = #myspid
select hostname, program_name, eventinfo
from #temp t
join sysprocesses s on t.myspid = s.spid
WHERE spid = #myspid
You might use sql-profiler to trac the update of a given table / column.

Resources