Providing a "user-friendly" interface to fulltext search - sql-server

We use SQL Server fulltext search for several tables on our application. We always use AND searches, e.g.:
"Evil" returns "Evil Dead" and "The Evil of Fu Manchu"
"Evil Fu" returns only "The Evil of Fu Manchu"
We want to keep the interface down to just a single search box and button, and we don't want people to have to learn much (if any) in the way of special syntax. We use CONTAINS rather than FREETEXTTABLE because of the AND requirement.
The limitation is that CONTAINS does not seem to match synonyms. My question is really: does anyone out there have a pattern for "interpreting" user input to useful fulltext syntax?

If you had a synonym table you could look up a set of terms and run those through your CONTAINS query.
You could use something like the Big Huge Thesaurus API

Related

Azure SQL Server - Full Text Search - Partial Words/Leading Wildcard

I've seen several questions on SO about the possibility of matching partial words in a Full-Text Search on SQL Server but they are all quite old so I'm posting to see if there is an update on the situation...
The Problem:
I have a keyword search running on a single field in a table that is using Full-Text Search.
I want to be able to match a partial word, not just a wildcard search from the start of a given word.
So, I know I can do:
Contains(table.myfield, '"par*"' which will match things like party, partner etc...
I also want to be able to say:
Contains(table.myfield, '"*par*"' to match things like spartan, sparing etc...
Is it true to say that FTS cannot achieve this and I would have to resort to LIKE '%par%' to get the results I require?
Full-Text Search still does not allow double wildcard. However, you can now use Azure Search to perform regular expressions searches on multiple columns at the same time using Lucene syntax as explained here. For example to search for all jobs with either the term Senior or Junior you can do the following search:
&queryType=full&$select=business_title&search=business_title:/(Sen|Jun)ior/

Wildcard search in cassandra database

I want to know if there is any way to perform wildcard searches in cassandra database.
e.g.
select KEY,username,password from User where username='\*hello*';
Or
select KEY,username,password from User where username='%hello%';
something like this.
There is no native way to perform such queries in Cassandra. Typical options to achieve the same are
a) Maintain an index yourself on likely search terms. For example, whenever you are inserting an entry which has hello in the username, insert an entry in the index column family with hello as the key and the column value as the key of your data entry. While querying, query the index CF and then fetch data from your data CF. Of course, this is pretty restrictive in nature but can be useful for some basic needs.
b) A better bet is to use a full text search engine. Take a look at Solandra, https://github.com/tjake/Solandra or Datastax enterprise http://www.datastax.com/products/enterprise
This project also looks promising
http://tuplejump.github.io/stargate/
I have not looked deeply at it recently, but when I last evaluated it, it looked promising.

Full Text Search and LIKE statement

Does the GAE experimental Full Text Search API provide an alternative to the SQL "LIKE statement"?
Thanks!
No. The SQL like statement supports arbitrary substring matching - for instance, "abbatton" will be a match for "bat" - while fulltext search implements full text indexing, which uses normalization, stemming, and an inverted index to construct an index that is good at answering the sort of queries users tend to enter for textual documents.
If you mean "does the Full Text search API provide an alternative for what SQL's LIKE operator is commonly (mis)used for", the answer is yes - since the most common application for SQL's LIKE appears to be fulltext search, the Full Text Search API is actually better suited to this than LIKE for a number of reasons, including efficiency ('LIKE' requires scanning every row of the table), accuracy (Full Text search provides ranking, stemming, and other features), and eliminating false positives (see the example above).

What is a suitable index for a text field in postgres database?

I have a database that stores details of Code Chekins from various SCRs. One of the table in this database store Commit Comments for each checkin. I am trying to develop a search feature which with the help of Postgres posix notation searches through this table trying to match a regular expression on this comment field and return all the matched.
I have already got this to work, but the main problem here is the performance of this search. For a fairly big database it almost takes 15-20 mins for a search to complete and as its a web frontend waiting for the result this is totally unacceptable time for a medium sized database.
I figured that creating an index on this text field might help but I am unable to create a btree index because data for some of the rows is too big for potgres to create index on it.
Is there any other solution to this? Are there any other indexes that can be created which again should not be language dependent?
Check the full text search functions, regular expressions can't use indexes.
Now, you can use pg_trgm extension.
Documentation:
http://www.postgresql.org/docs/9.1/static/pgtrgm.html
Good start point:
http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/
Yeah, Full Text Searching is your answer here. PostgreSQL has a pretty robust and fast FTS capability.
Others have mentioned full text searching. If you need regular expressions rather than full text searching, there is no way to index them in a generic way. As long as the expression is anchored at the beginning of the string (using ^ at the start), an index can usually be used, but for generic regular expressions, there is no way to use an index for searching them.
use pg_trgm extension
CREATE EXTENSION pg_trgm;
then you can create index for field name like
CREATE INDEX tmp ON companies USING GIN (name gin_trgm_ops);
this index will be used for search like
SELECT * from companies where name ~* 'jet'

How to implement an Enterprise Search

We are searching disparate data sources in our company. We have information in multiple databases that need to be searched from our Intranet. Initial experiments with Full Text Search (FTS) proved disappointing. We've implemented a custom search engine that works very well for our purposes. However, we want to make sure we are doing "the right thing" and aren't missing any great tools that would make our job easier.
What we need:
Column search
ability to search by column
we flag which columns in a table are searchable
Keep some relation between db column and data
we provide advanced filtering on the results
facilitates (amazon style) filtering
filter provided by grouping of results and allowing user to filter them via a checkbox
this is a great feature, users like it very much
Partial Word Match
we have a lot of unique identifiers (product id, etc).
the unique id's can have sub parts with meaning (location, etc)
or only a portion may be available (when the user is searching)
or (by a decidedly poor design decision) there may be white space in the id
this is a major feature that we've implemented now via CHARINDEX (MSSQL) and INSTR (ORACLE)
using the char index functions turned out to be equivalent performance(+/-) on MSSQL compared to full text
didn't test on Oracle
however searches against both types of db are very fast
We take advantage of Indexed (MSSQL) and Materialized (Oracle) views to increase speed
this is a huge win, Oracle Materialized views are better than MSSQL Indexed views
both provide speedups in read-only join situations (like a search combing company and product)
A search that matches user expectations of the paradigm CTRL-f -> enter text -> find matches
Full Text Search is not the best in this area (slow and inconsistent matching)
partial matching (see "Partial Word Match")
Nice to have:
Search database in real time
skip the indexing skip, this is not a hard requirement
Spelling suggestion
Xapian has this http://xapian.org/docs/spelling.html
Similar to google's "Did you mean:"
What we don't need:
We don't need to index documents
at this point searching our data sources are the most important thing
even when we do search documents, we will be looking for partial word matching, etc
Ranking
Our own simple ranking algorithm has proven much better than an FTS equivalent.
Users understand it, we understand it, it's almost always relevant.
Stemming
Just don't need to get [run|ran|running]
Advanced search operators
phrase matching, or/and, etc
according to Jakob Nielsen http://www.useit.com/alertbox/20010513.html
most users are using simple search phrases
very few use advanced searches (when it's available)
also in Information Architecture 3rd edition Page 185
"few users take advantage of them [advanced search functions]"
http://oreilly.com/catalog/9780596000356
our Amazon like filtering allows better filtering anyway (via user testing)
Full Text Search
We've found that results don't always "make sense" to the user
Searching with FTS is hard to tune (which set of operators match the users expectations)
Advanced search operators are a no go
we don't need them because
users don't understand them
Performance has been very close (+/1) to the char index functions
but the results are sometimes just "weird"
The question:
Is there a solution that allows us to keep the key value pair "filtering feature", offers the column specific matching, partial word matching and the rest of the features, without the pain of full text search?
I'm open to any suggestion. I've wondered if a document/hash table nosql data store (MongoDB, et al) might be of use? ( http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo ). Any experience with these is appreciated.
Again, just making sure we aren't missing something with our in-house customized version. If there is something "off the shelf" I would be interested in it. Or if you've built something from some components, what components (search engines, data stores, etc) did you use and why?
You can also make your point for FTS. Just make sure it meets the requirements above before you say "just use Full Text Search because that's the only tool we have."
I ended up coding my own.
The results are fantastic. Users like it, it works well with our existing technologies.
It really wasn't that hard. Just took some time.
Features:
Faceted search (amazon, walmart, etc)
Partial word search (the real stuff not full text)
Search databases (oracle, sql server, etc) and non database sources
Integrates well with our existing environment
Maintains relations, so I can have a n to n search and display
--> this means I can display child records of a master record in search results
--> also I can search any child field and return the master record
It's really amazing what you can do with dictionaries and a lot of memory.
I recommend looking into Solr, I believe it will meet you needs:
http://lucene.apache.org/solr/
For an off-she-shelf solution: Have you checked out the Google Search Appliance?
Quote from the Google Mini/GSA site:
... If direct database indexing is a requirement for you, we encourage you to consider the Google Search Appliance, which has direct database connectivity.
And of course it indexes everything else in the Googly manner you'd expect it to.
Apache Solr is a good way to start your project with and it is open source . You can also try Elastic Search and there are a lot of off shelf products which offer good customization abilities and search features such as Coveo, SharePoint Fast, Google ...

Resources