CF9's Apache Lucene vs SQL Server's full text search? - sql-server

ColdFusion 9's full text search is now based on Apache Lucene Solr (or Verity, but it has too much limitations). We also use SQL Server.
Which one's better? Which one's easier?
UPDATE: going to use for... searching against the name & description fields of the Products table.
Thanks!

Here's my 2 cents tested with ~ 3 000 000 of images with captions (primary key + image caption text from 100 to 500 chars):
CF9's Solr implementation is fast in returning results, really easy to setup, fairly fast during building index.
SQL Server 2005 FTS wasn't good enough, tried it some time ago and didn't put it in production. SQL Server 2008 FTS is much better though, currently using it on our application. But basic setup had to be adjusted in order to get high level results.
Based on experiences of other colleagues working with huge data sets and applications mostly based on search and finding things I made my top list:
Lucene
Tuned SQL Server 2008 FTS
Solr
SQL Server 2005
Of course CF9's Solr is winner here if you are chasing fast setup since you need 3 tags to finish the job and get awesome results.

The important question: What are you going to use it for?
Can't pick the right tool for the job when you don't know what the job is ;)

Related

Automated SQL Server slow query report?

I am a developer and performance tester but not a DBA. My team is working on a performance testing tool that is specific to our software. One of the features we want it to have is the ability to generate a database report immediately after the test. Our software is database agnostic. For Oracle, I can easily create a snapshot id before and after the test and programmatically create an AWR report for those snapshots, write to a file and save with other artifacts we gather. Works great.
For SQL Server, however, there is no AWR equivalent (that I know of). I know the MDW as part of the SSMS has a UI for getting things like top 10 slow SQL and things like that. But, I have not yet found a way to programmatically create and extract a SQL performance report (preferably similar to Oracle's AWR) for SQL Server.
I am even willing to create the report myself if I can find a way to extract the raw data.
Any ideas would be greatly appreciated because searching online is not getting me anywhere.
P.S. I'm trying to do this in Java, by the way, but will accept help in any language. Thanks again!
Good news! In SQL Server 2016, you can use Query Store. This is like your flight recorder blackbox.. finding long running queries and waits. Capture baseline built in to SQL Server. You can compare before and after hardware changes and/or upgrades on queries. Maybe this similar to Oracle AWR.
Only available SQL Server 2016 and up.

machine learning ranking & DB full text search ranking

I am doing a DB Full Text Search Project, I need little assistance to start. The Database which I am going to use is SQL Server 2014 and ORACLE DB, we have huge amounts of data in our disks. I have collected some information regarding the DB full text search and ranking through sql server documentation, but now my guide says why don't you go for Learning to Rank of Machine learning concepts. I have no idea about it, so I need information about it and does it has any ranking algorithms and which can I go for and how to do as well
Thanks in advance!

Convert SQL Server queries to Postgres on the fly

I have a scenario where I get queries on a webservice that need to be executed on a database.
The source for these queries is from a physical device so I cant really change the input to my queries.
I get the queries from the device in MSSQL. Earlier the backend was in SQL Server, so things were pretty straight forward. Queries would come in and get executed as is on the DB.
Now we have migrated to Postgres and we don't have to the option to modify the input data (SQL queries).
What I want to know is. Is there any library that will do this SQL Server/T-SQL translation for me so I can run the SQL Server queries through this and execute the resulting Postgres query on the database. I searched a lot but couldn't find much that would do this. (There are libraries that convert schema from one to another but what I need is to be able to translate SQL Server queries to Postgres on the fly)
I understand there are quite a bit of nuances that will be different between SQL and postgres so a translator will be needed in between. I am open to libraries in any language(that preferably runs on linux : ) ) or if you have any other suggestions on how to go about this would also be welcome.
Thanks!
If I were in your position I would have a look on upgrading your SQL Sever to 2019 ASAP (as of today, you can find on Twitter that the officially supported production ready version is available on request). Then have a look on the Polybase feature they (re)introduced in this version. In short words it allows you to connect your MSSQL instance to other data source (like Postgres) and query the data in as they would be "normal" SQL Server DB (via T-SQL) then in the background your queries will be transformed into the native pgsql and consumed from your real source.
There is not much resources on this product (as 2019 version) yet, but it seems to be one of the most powerful features coming with this release.
This is what BOL is saying about it (unfortunately, it mostly covers the old 2016 version).
There is an excellent, yet very short presentation by Bob Ward (
Principal Architect # Microsoft) he did during SQL Bits 2019 on this topic.
The only thing I can think of that might be worth trying is SQL::Translator. It's a set of Perl modules that have been around for ages but seem to be still maintained. Whether it does what you want will depend on how detailed those queries are.
The no-brainer solution is to keep a SQL Server Express in place and introduce Triggers that call out to the Postgres database.
If this is too heavy, you can look at creating a Tabular Data Stream (TDS is SQL Server network transport) gateway with limited functionality and map each possible incoming query with any parameters to a static Postgres query. This limits any testing to a finite, small, number of cases.
This way, there is no SQL Server, and you have more control than with the trigger option.
If your terminals have a limited dialect demand then this may be practical. Attempting a general translation is very likely to be worth more than the devices cost to replace (unless you have zillions already deployed).
There is an open implementation FreeTDS that you could use if you are happy with C or Java.

what is the difference between lucene.NET and DBMS?

I am building a search engine and I finished the first phase which is spidering (fetching html documents and parsing each document to get the other links). Now I must index the content of html documents. First of all I decided to use DBMS (like SQL Server) for this purpose but I found another library called Lucene.NET.
What is the difference between lucene.NET and SQL Server and which one is better to use to index html documents? I read alot about Lucene.Net and I surprised that it gives better performance than SQL Server. Can any one explain this to me?
SQL Server is a general purpose RDBMS that is not optimized for very fast text indexing (yes, it has full text indexes, but it does lots of other things at the same time).
Lucene.NET is not a RDBMS and its main function is fast text indexing.
Not that surprising it is better at it than SQL Server.

Searching full-text fields in SQL Server to detect plagiarism

I'm storing papers in SQL Server 2005 and am looking for a way to paste in the text of a paper and then search for potential plagiarism (copied content) in the database.
What's the best way to go about this? Is there a way to get a gauge for the extent to which something is similar to something else using full-text indexing, for several paragraphs of content?
why don't you install google desktop and have it only index that one directory
then you can have google do the indexing for you
This is not really the sort of problem that full-text indexing in SQL Server is designed to solve. There's nothing built in to SQL Server that you can really use to help with this.
There are a number of specialised plagiarism detection tools, which a Google search will turn up for you. That's probably your best bet.

Resources