Azure SQL Server - Full Text Search - Partial Words/Leading Wildcard - sql-server

I've seen several questions on SO about the possibility of matching partial words in a Full-Text Search on SQL Server but they are all quite old so I'm posting to see if there is an update on the situation...
The Problem:
I have a keyword search running on a single field in a table that is using Full-Text Search.
I want to be able to match a partial word, not just a wildcard search from the start of a given word.
So, I know I can do:
Contains(table.myfield, '"par*"' which will match things like party, partner etc...
I also want to be able to say:
Contains(table.myfield, '"*par*"' to match things like spartan, sparing etc...
Is it true to say that FTS cannot achieve this and I would have to resort to LIKE '%par%' to get the results I require?

Full-Text Search still does not allow double wildcard. However, you can now use Azure Search to perform regular expressions searches on multiple columns at the same time using Lucene syntax as explained here. For example to search for all jobs with either the term Senior or Junior you can do the following search:
&queryType=full&$select=business_title&search=business_title:/(Sen|Jun)ior/

Related

Difference between full text and free text search in solr (other search db)

New to search databases and working with one. What is the difference between full text and free text search/index?
They are kind of same. More precisely they are just synonyms.
They are techniques used by search engines to find results in a database.
Solr uses Lucene project for it's search engine. It is used when you have a large documents to be searched and, you can't use LIKE queries with normal RDMS considering the performance.
Mianly it's follows two stages indexing and searching. The indexing stage will scan the text of all the documents and build a list of search terms. In the search stage, when performing a specific query, only the index is referenced, rather than the text of the original documents.
Suppose you typed John and Ryan, query will return will all the items in document which either contains "John" or "Ryan". Order and case sensitiveness doesn't matter.
In nutshell, unless you are using/terming them in specific use case, they are just calling different name for same person.
Call him Cristiano or CR7, they are same :)

Fulltext search with partial strings on Postgresql

I was assigned to develop a full-text search functionality on PostgreSql 9.3 and I'd be very glad if I can hear other opinions and advices in this matter.
The problem is, that I need to implement a partial word match. An user will send out a string which can contain partial words, separated by space, and without order.
For example: string "lue ped zeb" should find a row with "Blue striped zebra" in it (in one column). It should be case-insensitive and the order of words should not matter (but these conditions are insignificant in this question).
Problem is performance. There are over 5 million rows in the database table on which the search is performed and I need to get to very small execution times.
Example query would be "SELECT * FROM table WHERE LOWER(text) LIKE ('%lue%ped%zeb');", which I suspect will be VERY slow because the wildcard at first position will cause the query to ignore indexes.
So far, I've found http://www.sai.msu.su/~megera/wiki/wildspeed, which is a index that could help me (size of the index doesn't really matter in this case), but the production server is running MS Windows and I don't know if this extension will be able to compile on windows. (I will try it and update my question).
I'm not a database developer and use Postgres usually only from applications, so I don't have much experience in database optimalization and lower-level operations.
Does anyone have some experience with similar problem, word of advice or example that can help me with this task?
Trigram is a contrib module for Postgres, which can help you achieve your goal. There is a complete example of its usage in the docs.
Beginning in 9.1, trigram support index searches for LIKE and ILIKE operators.
Beginning in 9.3, it support index searches for regular-expression matches (~ and ~* operators).
But if you want to search for any order of the provided partial words, you should query for each word separate:
...
WHERE LOWER(text) LIKE '%lue%'
OR LOWER(text) LIKE '%ped%'
OR LOWER(text) LIKE '%zeb%'

database for google type searches

We need to be able to perform fast searches against 10 million tweets we have stored off. Any suggestions for a good database to use for this? We'd prefer to be able to do regular expressions searches but it's sufficient to be able to find all entries that contain a given word.
thanks - dave
Answer at Microsoft MSDN forum - database for bing type searches
Full-Text queries perform a linguistic search against this data,
operating on words and phrases based on rules of a particular
language.
A LIKE query against millions of rows of text data can
take minutes to return; whereas a full-text query can take only
seconds or less against the same data, depending on the number of rows
that are returned. We can use Full-Text Search to perform a fuzzy
search and then use LIKE clause to return the records that have an
exact match of our search conditions.
For more information, please refer to the following links:
Full-Text Search Overview:
http://msdn.microsoft.com/en-us/library/ms142571.aspx
SQL Server 2008 Full-Text Search: Internals and Enhancements
http://technet.microsoft.com/en-us/library/cc721269(SQL.100).aspx
You could use http://incubator.apache.org/lucene.net/ which is used by stackoverflow and RavenDB.

How to implement an Enterprise Search

We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.
What we need:
Column search
ability to search by column
we flag which columns in a table are searchable
Keep some relation between db column and data
we provide advanced filtering on the results
facilitates (amazon style) filtering
filter provided by grouping of results and allowing user to filter them via a checkbox
this is a great feature, users like it very much
Partial Word Match
we have a lot of unique identifiers (product id, etc).
the unique id's can have sub parts with meaning (location, etc)
or only a portion may be available (when the user is searching)
or (by a decidedly poor design decision) there may be white space in the id
this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
didn't test on Oracle
however searches against both types of db are very fast
We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
both provide speedups in read-only join situations (like a search combing company and product)
A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
Full Text Search is not the best in this area (slow and inconsistent matching)
partial matching (see "Partial Word Match")
Nice to have:
Search database in real time
skip the indexing skip, this is not a hard requirement
Spelling suggestion
Xapian has this http://xapian.org/docs/spelling.html
Similar to google's "Did you mean:"
What we don't need:
We don't need to index documents
at this point searching our data sources are the most important thing
even when we do search documents, we will be looking for partial word matching, etc
Ranking
Our own simple ranking algorithm has proven much better than an FTS equivalent.
Users understand it, we understand it, it's almost always relevant.
Stemming
Just don't need to get [run|ran|running]
Advanced search operators
phrase matching, or/and, etc
according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
most users are using simple search phrases
very few use advanced searches (when it's available)
also in Information Architecture 3rd edition Page 185
"few users take advantage of them [advanced search functions]"
http://oreilly.com/catalog/9780596000356
our Amazon like filtering allows better filtering anyway (via user testing)
Full Text Search
We've found that results don't always "make sense" to the user
Searching with FTS is hard to tune (which set of operators match the users expectations)
Advanced search operators are a no go
we don't need them because
users don't understand them
Performance has been very close (+/1) to the char index functions
but the results are sometimes just "weird"
The question:
Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?
I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.
Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?
You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."
I ended up coding my own.
The results are fantastic. Users like it, it works well with our existing technologies.
It really wasn't that hard. Just took some time.
Features:
Faceted search (amazon, walmart, etc)
Partial word search (the real stuff not full text)
Search databases (oracle, sql server, etc) and non database sources
Integrates well with our existing environment
Maintains relations, so I can have a n to n search and display
--> this means I can display child records of a master record in search results
--> also I can search any child field and return the master record
It's really amazing what you can do with dictionaries and a lot of memory.
I recommend looking into Solr, I believe it will meet you needs:
http://lucene.apache.org/solr/
For an off-she-shelf solution: Have you checked out the Google Search Appliance?
Quote from the Google Mini/GSA site:
... If direct database indexing is a requirement for you, we encourage you to consider the Google Search Appliance, which has direct database connectivity.
And of course it indexes everything else in the Googly manner you'd expect it to.
Apache Solr is a good way to start your project with and it is open source . You can also try Elastic Search and there are a lot of off shelf products which offer good customization abilities and search features such as Coveo, SharePoint Fast, Google ...

Providing a "user-friendly" interface to fulltext search

We use SQL Server fulltext search for several tables on our application. We always use AND searches, e.g.:
"Evil" returns "Evil Dead" and "The Evil of Fu Manchu"
"Evil Fu" returns only "The Evil of Fu Manchu"
We want to keep the interface down to just a single search box and button, and we don't want people to have to learn much (if any) in the way of special syntax. We use CONTAINS rather than FREETEXTTABLE because of the AND requirement.
The limitation is that CONTAINS does not seem to match synonyms. My question is really: does anyone out there have a pattern for "interpreting" user input to useful fulltext syntax?
If you had a synonym table you could look up a set of terms and run those through your CONTAINS query.
You could use something like the Big Huge Thesaurus API

Resources