full index checkbox while creating new database - sql-server

I am creating a new database, which I am basically designing for the logging/history purpose. So, I'll make around 8-10 tables in this database. Which will keep the data and I'll retrieve it for showing history information to the user.
I am creating database from the SQL Server 2005 and I can see that there is a check box of " Use full Indexing". I am not sure whether I make it check or unchecked. As I am not familiar with the database too much, suggest me that by checking it, will it increase the performance of my database in retrieval?

I think that is the check box for FULLTEXT indexing.
You turn it on only if you plan to do some natural language queries or a lot of text-based queries.
See here for a description of what it is used to support.
http://msdn.microsoft.com/en-us/library/ms142571.aspx
From that base link, you can follow through to http://msdn.microsoft.com/en-us/library/ms142547.aspx (amongst others). Interesting is this quote
Comparison of LIKE to Full-Text Search
In contrast to full-text search, the LIKE Transact-SQL predicate works
on character patterns only. Also, you cannot use the LIKE predicate to
query formatted binary data. Furthermore, a LIKE query against a large
amount of unstructured text data is much slower than an equivalent
full-text query against the same data. A LIKE query against millions
of rows of text data can take minutes to return; whereas a full-text
query can take only seconds or less against the same data, depending
on the number of rows that are returned.
There is a cost for this of course which is in the storage of the patterns and relationships between words in the same record. It is really useful if you are storing articles for example, where you want to enable searching by "contains a, b and c". A LIKE pattern would be complicated and extremely slow to process like %A%B%C% OR LIKE '%B%A%C' Or ... and all the permutations for the order of appearance of A, B and C.

Related

Databases: Effectively implement string contains query

I need a way to effectively do a string contains query like:
# In SQL
LIKE '%some-string%'
# In mongo
{ $regex: /some-string/ }
But its very slow when the dataset size is big. Eg. I tried in a dummy DB (with and without an index - no index is surprisingly faster on mongo) and generate 100m rows (in reality theres more). Seems reasonable if I use ElasticSearch, but I am wondering if theres a DB or way I can structure my data to optimise this use case? I asked and I really need contains instead of a prefix match ...
Postgresql offers so-called trigram indexes. Those indexes can accelerate SQL col LIKE '%search%' predicates efficiently enough. Notice that indexing can, in all makes of server, speed up col LIKE 'string%' (without the leading wildcard character).
MySQL / Mariadb have FULLTEXT indexes that work with a distinctive SQL syntax. That feature works word-by-word unlike, well, LIKE which works character-by-character. Microsoft SQL Server has a similar feature with different syntax. It also works word-by-word.
So, there's no SQL standard way to do this efficiently, and different makes of server do it differently.
If you haven't yet chosen a particular make of server, you should figure out whether one of the full text schemes will serve your purpose. If you must get good performance from LIKE,
postgresql's trigram indexing is the way to go.
There's no general solution to this that works for all database systems i think. As another answer already explains, there are fulltext search extensions to a lot of popular database systems that, while they're far from being able to do what stuff like Lucene/ElasticSearch can do, should be enough to massively speed up your use case.
Let me explain this from a database internals perspective. Let's say that your selectivity is high a.k.a only a very small percentage of your tuples actually match your condition then you would generally want to have some kind of index structure. The kind of index structure you would need for this kind of query is some kind of Radix-Tree/Trie but that's not a standard data structure implemented in all SQL databases. The only data structure that is actually implemented in almost all SQL databases is a B-Tree. But a B-Tree can only do Prefix queries something like LIKE 'test%'. The only chance you have for LIKE '%test%' if your database doesn't have such indexes is having a very fast runtime system which none of the traditional (open source) database systems has...

Will indexing improve varchar(max) query performance, and how to create index

Firstly, I should point out I don't have much knowledge on SQL Server indexes.
My situation is that I have an SQL Server 2008 database table that has a varchar(max) column usually filled with a lot of text.
My ASP.NET web application has a search facility which queries this column for keyword searches, and depending on the number of keywords searched for their may be one or many LIKE '%keyword%' statements in the SQL query to do the search.
My web application also allows searching by various other columns in this table as well, not just that one column. There is also a few joins from other tables too.
My question is, is it worthwhile creating an index on this column to improve performance of these search queries? And if so, what type of index, and will just indexing the one column be enough or do I need to include other columns such as the primary key and other searchable columns?
The best analogy I've ever seen for why an index won't help '%wildcard%' searches:
Take two people. Hand each one the same phone book. Say to the person on your left:
Tell me how many people are in this phone book with the last name "Smith."
Now say to the person on your right:
Tell me how many people are in this phone book with the first name "Simon."
An index is like a phone book. Very easy to seek for the thing that is at the beginning. Very difficult to scan for the thing that is in the middle or at the end.
Every time I've repeated this in a session, I see light bulbs go on, so I thought it might be useful to share here.
you cannot create an index on a varchar(max) field. The maximum amount of bytes on a index is 900. If the column is bigger than 900 bytes, you can create the index but any insert with more then 900 bytes will fail.
I suggest you to read about fulltext search. It should suits you in this case
It's not worthwhile creating a regular index if you're doing LIKE '%keyword%' searches. The reason is that indexing works like searching a dictionary, where you start in the middle then split the difference until you find the word. That wildcard query is like asking you to lookup a word that contains the text "to" or something-- the only way to find matches is to scan the whole dictionary.
You might consider a full-text search, however, which is meant for this kind of scenario (see here).
The best way to find out is to create a bunch of test queries that resemble what would happen in real life and try to run them against your DB with and without the index. However, in general, if you are doing many SELECT queries, and little UPDATE/DELETE queries, an index might make your queries faster.
However, if you do a lot of updates, the index might hurt your performance, so you have to know what kind of queries your DB will have to deal with before you make this decision.

How to implement an Enterprise Search

We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.
What we need:
Column search
ability to search by column
we flag which columns in a table are searchable
Keep some relation between db column and data
we provide advanced filtering on the results
facilitates (amazon style) filtering
filter provided by grouping of results and allowing user to filter them via a checkbox
this is a great feature, users like it very much
Partial Word Match
we have a lot of unique identifiers (product id, etc).
the unique id's can have sub parts with meaning (location, etc)
or only a portion may be available (when the user is searching)
or (by a decidedly poor design decision) there may be white space in the id
this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
didn't test on Oracle
however searches against both types of db are very fast
We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
both provide speedups in read-only join situations (like a search combing company and product)
A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
Full Text Search is not the best in this area (slow and inconsistent matching)
partial matching (see "Partial Word Match")
Nice to have:
Search database in real time
skip the indexing skip, this is not a hard requirement
Spelling suggestion
Xapian has this http://xapian.org/docs/spelling.html
Similar to google's "Did you mean:"
What we don't need:
We don't need to index documents
at this point searching our data sources are the most important thing
even when we do search documents, we will be looking for partial word matching, etc
Ranking
Our own simple ranking algorithm has proven much better than an FTS equivalent.
Users understand it, we understand it, it's almost always relevant.
Stemming
Just don't need to get [run|ran|running]
Advanced search operators
phrase matching, or/and, etc
according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
most users are using simple search phrases
very few use advanced searches (when it's available)
also in Information Architecture 3rd edition Page 185
"few users take advantage of them [advanced search functions]"
http://oreilly.com/catalog/9780596000356
our Amazon like filtering allows better filtering anyway (via user testing)
Full Text Search
We've found that results don't always "make sense" to the user
Searching with FTS is hard to tune (which set of operators match the users expectations)
Advanced search operators are a no go
we don't need them because
users don't understand them
Performance has been very close (+/1) to the char index functions
but the results are sometimes just "weird"
The question:
Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?
I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.
Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?
You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."
I ended up coding my own.
The results are fantastic. Users like it, it works well with our existing technologies.
It really wasn't that hard. Just took some time.
Features:
Faceted search (amazon, walmart, etc)
Partial word search (the real stuff not full text)
Search databases (oracle, sql server, etc) and non database sources
Integrates well with our existing environment
Maintains relations, so I can have a n to n search and display
--> this means I can display child records of a master record in search results
--> also I can search any child field and return the master record
It's really amazing what you can do with dictionaries and a lot of memory.
I recommend looking into Solr, I believe it will meet you needs:
http://lucene.apache.org/solr/
For an off-she-shelf solution: Have you checked out the Google Search Appliance?
Quote from the Google Mini/GSA site:
... If direct database indexing is a requirement for you, we encourage you to consider the Google Search Appliance, which has direct database connectivity.
And of course it indexes everything else in the Googly manner you'd expect it to.
Apache Solr is a good way to start your project with and it is open source . You can also try Elastic Search and there are a lot of off shelf products which offer good customization abilities and search features such as Coveo, SharePoint Fast, Google ...

Creating an efficient search capability using SQL Server (and/or coldfusion)

I am trying to visualize how to create a search for an application that we are building. I would like a suggestion on how to approach 'searching' through large sets of data.
For instance, this particular search would be on a 750k record minimum table, of product sku's, sizing, material type, create date, etc;
Is anyone aware of a 'plugin' solution for Coldfusion to do this? I envision a google like single entry search where a customer can type in the part number, or the sizing, etc, and get hits on any or all relevant results.
Currently if I run a 'LIKE' comparison query, it seems to take ages (ok a few seconds, but still), and it is too long. At times making a user sit there and wait up to 10 seconds for queries & page loads.
Or are there any SQL formulas to help accomplish this? I want to use a proven method to search the data, not just a simple SQL like or = comparison operation.
So this is a multi-approach question, should I attack this at the SQL level (as it ultimately looks to be) or is there a plug in/module for ColdFusion that I can grab that will give me speedy, advanced search capability.
You could try indexing your db records with a Verity (or Solr, if CF9) search.
I'm not sure it would be faster, and whether even trying it would be worthwhile would depend a lot on how often you update the records you need to search. If you update them rarely, you could do an Verity Index update whenever you update them. If you update the records constantly, that's going to be a drag on the webserver, and certainly mitigate any possible gains in search speed.
I've never indexed a database via Verity, but I've indexed large collections of PDFs, Word Docs, etc, and I recall the search being pretty fast. I don't know if it will help your current situation, but it might be worth further research.
If your slowdown is specifically the search of textual fields (as I surmise from your mentioning of LIKE), the best solution is building an index table (not to be confiused with DB table indexes that are also part of the answer).
Build an index table mapping the unique ID of your records from main table to a set of words (1 word per row) of the textual field. If it matters, add the field of origin as a 3rd column in the index table, and if you want "relevance" features you may want to consider word count.
Populate the index table with either a trigger (using splitting) or from your app - the latter might be better, simply call a stored proc with both the actual data to insert/update and the list of words already split up.
This will immediately drastically speed up textual search as it will no longer do "LIKE", AND will be able to use indexes on index table (no pun intended) without interfering with indexing on SKU and the like on the main table.
Also, ensure that all the relevant fields are indexed fully - not necessarily in the same compund index (SKU, sizing etc...), and any field that is searched as a range field (sizing or date) is a good candidate for a clustered index (as long as the records are inserted in approximate order of that field's increase or you don't care about insert/update speed as much).
For anything mode detailed, you will need to post your table structure, existing indexes, the queries that are slow and the query plans you have now for those slow queries.
Another item is to enure that as little of the fields are textual as possible, especially ones that are "decodable" - your comment mentioned "is it boxed" in the text fields set. If so, I assume the values are "yes"/"no" or some other very limited data set. If so, simply store a numeric code for valid values and do en/de-coding in your app, and search by the numeric code. Not a tremendous speed improvement but still an improvement.
I've done this using SQL's full text indexes. This will require very application changes and no changes to the database schema except for the addition of the full text index.
First, add the Full Text index to the table. Include in the full text index all of the columns the search should perform against. I'd also recommend having the index auto update; this shouldn't be a problem unless your SQL Server is already being highly taxed.
Second, to do the actual search, you need to convert your query to use a full text search. The first step is to convert the search string into a full text search string. I do this by splitting the search string into words (using the Split method) and then building a search string formatted as:
"Word1*" AND "Word2*" AND "Word3*"
The double-quotes are critical; they tell the full text index where the words begin and end.
Next, to actually execute the full text search, use the ContainsTable command in your query:
SELECT *
from containstable(Bugs, *, '"Word1*" AND "Word2*" AND "Word3*"')
This will return two columns:
Key - The column identified as the primary key of the full text search
Rank - A relative rank of the match (1 - 1000 with a higher ranking meaning a better match).
I've used approaches similar to this many times and I've had good luck with it.
If you want a truly plug-in solution then you should just go with Google itself. It sounds like your doing some kind of e-commerce or commercial site (given the use of the term 'SKU'), So you probably have a catalog of some kind with product pages. If you have consistent markup then you can configure a google appliance or service to do exactly what you want. It will send a bot in to index your pages and find your fields. No SQl, little coding, it will not be dependent on your database, or even coldfusion. It will also be quite fast and familiar to customers.
I was able to do this with a coldfusion site in about 6 hours, done! The only thing to watch out for is that google's index is limited to what the bot can see, so if you have a situation where you want to limit access based on a users role or permissions or group, then it may not be the solution for you (although you can configure a permission service for Google to check with)
Because SQL Server is where your data is that is where your search performance is going to be a possible issue. Make sure you have indexes on the columns you are searching on and if using a like you can't use and index if you do this SELECT * FROM TABLEX WHERE last_name LIKE '%FR%'
But it can use an index if you do it like this SELECT * FROM TABLEX WHERE last_name LIKE 'FR%'. The key here is to allow as many of the first characters to not be wild cards.
Here is a link to a site with some general tips. https://web.archive.org/web/1/http://blogs.techrepublic%2ecom%2ecom/datacenter/?p=173

Multi Criteria Search Algorithm

Here's the problem : I've got a huuge (well at my level) mysql database with technical products in it. I ve got something like 150k rows of products in my database plus 10 to 20 others tables with the same amount of rows. Each tables contains a lot of criteria. Some of the criteria are text values, some are decimal, some are just boolean. I would like to provide a web access (php) to this database with filters on each criteria but I dont know how to do that really fast. I started to create a big table with all colums merged to avoid multiple join, it's cool, faster than the big join but still very very slow. Putting an index on all criteria, doesnt improve things (and i heard it was a bad idea). I was wondering if there were some cool algorithms that could help me preprocess the multi criteria search. Any idea ?
Thanks ahead.
If you're frustrated trying to do this in SQL, you could take a look at Lucene. It lets you do range searches, full text, etc.
Try Full Text Search
You might want to try globbing your text fields together and doing full text search.
Optimize Your Queries
For the other columns, rank them in order of how frequently you expect them to be used.
Write a test suite of queries, and run them all to get a sense of the performance. Then start adding indexes, and see how it affects performance. Keep adding indexes while the performance gets better. Stop when it gets worse.
Use Explain Plan
Since you didn't provide your SQL or table layout, I can't be more specific. But use the Explain Plan command to make sure your queries are hitting indexes, rather than doing table scans. This can be tricky since subtle stuff like the order of the columns in the query can affect whether or not an index is operative.

Resources