Should I use LIKE or CONTAINS on a second column? - sql-server

I have a table which has 2 columns (nvarchar(max) and varbinary(max). The binary column contains PDF documents and the catalog and index are setup to use this column.
The nvarchar column contains a list of id's (eg. "12","55","69", etc). This column can contain 100's of id's so that text would be quite long.
When building a search query, I always use CONTAINS, eg:
SELECT *
FROM mytable
WHERE CONTAINS(mybinarycolumn, 'keyword')
Depending on the search, I might or might not use the secondary column. So I was going to use IF to execute a second query, like this:
SELECT *
FROM mytable
WHERE CONTAINS(mybinarycolumn, 'keyword') AND
mytextcolumn LIKE '%"55"%'
Would I incur a performance hit if I use LIKE? Is it possible to combine CONTAINS and LIKE into one CONTAINS which might or might not use mytextcolumn in search? (If the text column must be used, it's always and AND with the binary column).

Assuming the normalization option isn't a good one for you...
I'm sure there will be a performance hit. LIKE is never a high performing operation, and you can't really build any indexes to help you out. If you are lucky, the SQL optimizer will do the CONTAINS part of the query first and apply the LIKE only to matching results. (Show execution plan will be your friend here.)
I can't think of a good way to combine the two columns into something that can be searched with a single CONTAINS; anything I've come up with looks like more work than the query as you have it.
You could try putting a full-text index on mytextcolumn and then use CONTAINS on that column as well. I'm not sure if that will help or not, but it may be worth a try.
I assume the values in mytextcolumn are well-delimited. If the column contains unquoted values, e.g. '12,23,45,67,777,890' instead of '"12","23","45","67","777","890"', your LIKE condition won't work the way you expect (because '%55%' would match both '11,22,55' and '11,22,555').
Good luck.

Related

Full text search order by text length

I have a search box which allow users to enter a partial string and show an auto complete for the whole name.
Basically I want to do WHERE LIKE %partial%. But since the LIKE query can't make use of index when there is a leading wildcard, I ended up using full text index
My full text query looks like CONTAINS(ColumnName, 'partial*')
This works well until I had to introduce sorting. The returned result needs to be sorted based on the string length.
CONTAINS(ColumnName, 'partial*') ORDER BY LEN(ColumnName)
The performance stinks since LEN has to be dynamically calculated followed by a sort. I tried creating a calculated column and indexed it, but it did not help much.
The table contains around 100k rows (which is not a huge one) and I wonder how I can tune the index to speed this up.
Even though you said you have a computed column you should make sure that it is persisted. When a computed column is not persisted it has to make the calculation for every row. This essentially makes your query nonSARGable. When having the column persisted the value is stored and the index can be properly utilized.

Need help to improve a T-SQL query

I have a table that contains about 32 million rows. In this table there is item_id (not primary key) and text column.
What I want to do is to concatenate text columns with same item_id's. And use this in a report.
So far we've been using FOR XML PATH keyword to concatenate. But our customer are not happy with the latency.
So we tried COALESCE (we tried the method described here) we didn't get a proper result (or maybe it was going to take too long)
So guys, if you know better method, technique can you help me about this issue?
Thanks...
Add an index for the item_id column.
Maybe use compressed data as field instead of characters.
Use a table with item_id as primary key with as fields: last text id, and concatenated text from first id to last id. Update this table incremental. Should be faster. Costs double space.
Do all of these:
Create indexes for the fields which are part of your where.
Instead of using Descartes multiplication use joins (if you are using Descartes multiplication in your query).
In your where clause put the quicker and less probable logical operands first. For instance where A and B is true if both of them are true, so if A is false, B won't be calculated at all and you win a lot of time. This might be a difference of doing hundreds of thousands of logical checking or not doing it, so, of course this is an optimization.
I hope this helps.

Creating an efficient search capability using SQL Server (and/or coldfusion)

I am trying to visualize how to create a search for an application that we are building. I would like a suggestion on how to approach 'searching' through large sets of data.
For instance, this particular search would be on a 750k record minimum table, of product sku's, sizing, material type, create date, etc;
Is anyone aware of a 'plugin' solution for Coldfusion to do this? I envision a google like single entry search where a customer can type in the part number, or the sizing, etc, and get hits on any or all relevant results.
Currently if I run a 'LIKE' comparison query, it seems to take ages (ok a few seconds, but still), and it is too long. At times making a user sit there and wait up to 10 seconds for queries & page loads.
Or are there any SQL formulas to help accomplish this? I want to use a proven method to search the data, not just a simple SQL like or = comparison operation.
So this is a multi-approach question, should I attack this at the SQL level (as it ultimately looks to be) or is there a plug in/module for ColdFusion that I can grab that will give me speedy, advanced search capability.
You could try indexing your db records with a Verity (or Solr, if CF9) search.
I'm not sure it would be faster, and whether even trying it would be worthwhile would depend a lot on how often you update the records you need to search. If you update them rarely, you could do an Verity Index update whenever you update them. If you update the records constantly, that's going to be a drag on the webserver, and certainly mitigate any possible gains in search speed.
I've never indexed a database via Verity, but I've indexed large collections of PDFs, Word Docs, etc, and I recall the search being pretty fast. I don't know if it will help your current situation, but it might be worth further research.
If your slowdown is specifically the search of textual fields (as I surmise from your mentioning of LIKE), the best solution is building an index table (not to be confiused with DB table indexes that are also part of the answer).
Build an index table mapping the unique ID of your records from main table to a set of words (1 word per row) of the textual field. If it matters, add the field of origin as a 3rd column in the index table, and if you want "relevance" features you may want to consider word count.
Populate the index table with either a trigger (using splitting) or from your app - the latter might be better, simply call a stored proc with both the actual data to insert/update and the list of words already split up.
This will immediately drastically speed up textual search as it will no longer do "LIKE", AND will be able to use indexes on index table (no pun intended) without interfering with indexing on SKU and the like on the main table.
Also, ensure that all the relevant fields are indexed fully - not necessarily in the same compund index (SKU, sizing etc...), and any field that is searched as a range field (sizing or date) is a good candidate for a clustered index (as long as the records are inserted in approximate order of that field's increase or you don't care about insert/update speed as much).
For anything mode detailed, you will need to post your table structure, existing indexes, the queries that are slow and the query plans you have now for those slow queries.
Another item is to enure that as little of the fields are textual as possible, especially ones that are "decodable" - your comment mentioned "is it boxed" in the text fields set. If so, I assume the values are "yes"/"no" or some other very limited data set. If so, simply store a numeric code for valid values and do en/de-coding in your app, and search by the numeric code. Not a tremendous speed improvement but still an improvement.
I've done this using SQL's full text indexes. This will require very application changes and no changes to the database schema except for the addition of the full text index.
First, add the Full Text index to the table. Include in the full text index all of the columns the search should perform against. I'd also recommend having the index auto update; this shouldn't be a problem unless your SQL Server is already being highly taxed.
Second, to do the actual search, you need to convert your query to use a full text search. The first step is to convert the search string into a full text search string. I do this by splitting the search string into words (using the Split method) and then building a search string formatted as:
"Word1*" AND "Word2*" AND "Word3*"
The double-quotes are critical; they tell the full text index where the words begin and end.
Next, to actually execute the full text search, use the ContainsTable command in your query:
SELECT *
from containstable(Bugs, *, '"Word1*" AND "Word2*" AND "Word3*"')
This will return two columns:
Key - The column identified as the primary key of the full text search
Rank - A relative rank of the match (1 - 1000 with a higher ranking meaning a better match).
I've used approaches similar to this many times and I've had good luck with it.
If you want a truly plug-in solution then you should just go with Google itself. It sounds like your doing some kind of e-commerce or commercial site (given the use of the term 'SKU'), So you probably have a catalog of some kind with product pages. If you have consistent markup then you can configure a google appliance or service to do exactly what you want. It will send a bot in to index your pages and find your fields. No SQl, little coding, it will not be dependent on your database, or even coldfusion. It will also be quite fast and familiar to customers.
I was able to do this with a coldfusion site in about 6 hours, done! The only thing to watch out for is that google's index is limited to what the bot can see, so if you have a situation where you want to limit access based on a users role or permissions or group, then it may not be the solution for you (although you can configure a permission service for Google to check with)
Because SQL Server is where your data is that is where your search performance is going to be a possible issue. Make sure you have indexes on the columns you are searching on and if using a like you can't use and index if you do this SELECT * FROM TABLEX WHERE last_name LIKE '%FR%'
But it can use an index if you do it like this SELECT * FROM TABLEX WHERE last_name LIKE 'FR%'. The key here is to allow as many of the first characters to not be wild cards.
Here is a link to a site with some general tips. https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=173

Multiple columns with ContainsTable and boolean logic with full text index

I have what I think is a very basic scenario, but what I've read makes it sound like this is not easy using SQL Server Full Text catalog and indexes.
I have 2 columns, First and Last name. I want to support full-text search on them such that if someone types "John Smith" people with a match on both first and last come up first.
Although it's easy to create an index across multiple columns, and easy to search multiple columns, the scoring doesn't reflect multiple columns.
SELECT [Key], Rank
FROM CONTAINSTABLE([User], (FirstName,LastName), '<CLAUSE_HERE>')
If CLAUSE_HERE is "john smith" I get no results, because that phrase does not exist in either field.
If it's "john OR smith" I get all users with either name in either field, sorted in an unhelpful order.
If it's "john AND smith" I get no results, because neither field contains both words.
Seems like the only solution is to autogenerate a query that runs containstable on each field, does some math, sums the scores, etc. Does that sound right? Is there an easier way around it? My actual query has a lot more fields to it - this is a simplified example.
Make a computed column which mushes together the fields that you're interested in searching on (in a way that makes sense for your search formats), and full-text index that.
As far as I'm aware this is the only work around if you want to full-text in this way because of the behavior you describe in your question.
You would need to test it, but I wonder if you might be able to use the ISABOUT() function to apply a weight to each keyword. Your search clause might look something like:
ISABOUT(john weight(0.2), smith weight(0.8))

What's the best way to store a title in a database to allow sorting without the leading "The", "A"

I run (and am presently completely overhauling) a website that deals with theater (njtheater.com if you're interested).
When I query a list of plays from the database, I'd like "The Merchant of Venice" to sort under the "M"s. Of course, when I display the name of the play, I need the "The" in front.
What the best way of designing the database to handle this?
(I'm using MS-SQL 2000)
You are on the right track with two columns, but I would suggest storing the entire displayable title in one column, rather than concatenating columns. The other column is used purely for sorting. This gives you complete flexibility over sorting and display, rather than being stuck with a simple prefix.
This is a fairly common approach when searching (which is related to sorting). One column (with an index) is case-folded, de-punctuated, etc. In your case, you'd also apply the grammatical convention of removing leading articles to the values in this field. This column is then used as a comparison key for searching or sorting. The other column is not indexed, and preserves the original key for display.
Store the title in two fields: TITLE-PREFIX and TITLE-TEXT (or some such). Then sort on the second, but display the concatenation of the two, with a space between.
My own solution to the problem was to create three columns in the database.
article varchar(4)
sorttitle varchar(255)
title computed (article + sortitle)
"article" will only be either "The ", "A " "An " (note trailing space on each) or empty string (not null)
"sorttitle" will be the title with the leading article removed.
This way, I can sort on SORTTITLE and display TITLE. There's little actual processing going on the computed field (so it's fast), and there's only a little work to be done when inserting.
I agree with doofledorfer, but I would recommend storing spaces entered as part of the prefix instead of assuming it's a single space. It gives your users more flexibility. You may also be able to do some concatenation in your query itself, so you don't have to merge the fields as part of your business logic.
I don't know if this can be done in SQL Server. If you can create function based indexes you could create one that does a regex on the field or that uses your own function. This would take less space than an additional field, would be kept up to date by the database itself, and allows the complete title to be stored together.

Resources