Searching full-text fields in SQL Server to detect plagiarism - sql-server

I'm storing papers in SQL Server 2005 and am looking for a way to paste in the text of a paper and then search for potential plagiarism (copied content) in the database.
What's the best way to go about this? Is there a way to get a gauge for the extent to which something is similar to something else using full-text indexing, for several paragraphs of content?

why don't you install google desktop and have it only index that one directory
then you can have google do the indexing for you

This is not really the sort of problem that full-text indexing in SQL Server is designed to solve. There's nothing built in to SQL Server that you can really use to help with this.
There are a number of specialised plagiarism detection tools, which a Google search will turn up for you. That's probably your best bet.

Related

How to find Sharepoint Library Metadata in the SQL server

Working on migrating documents out of the document library and into a different system and I want to export out of the SQL server the metadata associated with the documents into the new system.
I'm using SQL Management Studio and HIEDISQL to look and find these records but I cant find them anywhere via searching.
This is SQL server 2008 running Sharepoint 2010.
Any help would be greatly appreciated.
I have googled a lot for the last week and have not been able to come up with anything since Google is trying to be smarter than my "exact phrase" searches so its been pretty frustrating :(
I had this same issue, but ended up using the .Net Client Side Object Model. Pluralsight has some excellent training videos on this. I'm not certain but I think the actual storage structure for the metadata is in a varbinary or some other sort of blob field, so it might not even be feasible to access by direct query. Also, direct querying of the sharepoint database could void your support agreement with MS.
Hope this helps...

Is it possible to use Microsoft 2013 sharepoint search server as my search engine for my site

My site is not written for sharepoint.
It runs on IIS(aspmvc) interacts over http request/response and fetches db data.
Does it make sense to install and use Microsoft 2013 sharepoint search for the db indexing and free text querying (ms sql) ?
(I know I can use MS Full Text Search but the features and performance are too poor)
(I know I can use Solr/Lucene. It is a great solution indeed. I just wonder if I can do it in MS technologies)
Can I install it not as a part of Sharepoint? as a standalone indexer?
How? will it require sharepoint foundation search?
Should I install Microsoft Search Server 2010 instead for this feature? Is it as good as 2013 sharepoint search?
Thanks.
Not going to answer your questions one by one, so just skimming through them:
You will not be able to use any of SharePoint's searches without installing SharePoint. There is no separate search server for SP2013 anymore, it's all one product.
So to answer your question three: SP2013 is better than using Search Server 2010 as it includes some FAST features which you previously had to pay for. For a complete comparison what you get with the free version (foundation) see this page:
SharePoint 2013 feature comparison chart all editions
You can search through any publicly available website with the default SharePoint search, you can also use it to search using webservices or using GET parameters. It would also be possible to directly search through your database using BCS (Business data connectivtiy services), but the foundation version is a bit limited there.
I think the main problem is that you would have to install the whole SharePoint and maintain it as well. I'm not sure it's worth the hassle installing the whole product if you only want to use search. This is exactly Microsoft had inteded for the Search Server 2010 product, but they discontinued it.
Your questions quickly answered:
Sure, it's a number one product for search. See Garnters analysis about this.
Search Server 2010, yes you can. SP2013, no.
2013 includes the FAST search component, you previously had to pay a lot for. It's better.
My 2 cents: If you only want search, go with a search product like Lucene based products. If you want "more" than just search, or you don't want to get into yet another technology (if you already know some SharePoint) - go with SP.

what is the difference between lucene.NET and DBMS?

I am building a search engine and I finished the first phase which is spidering (fetching html documents and parsing each document to get the other links). Now I must index the content of html documents. First of all I decided to use DBMS (like SQL Server) for this purpose but I found another library called Lucene.NET.
What is the difference between lucene.NET and SQL Server and which one is better to use to index html documents? I read alot about Lucene.Net and I surprised that it gives better performance than SQL Server. Can any one explain this to me?
SQL Server is a general purpose RDBMS that is not optimized for very fast text indexing (yes, it has full text indexes, but it does lots of other things at the same time).
Lucene.NET is not a RDBMS and its main function is fast text indexing.
Not that surprising it is better at it than SQL Server.

CF9's Apache Lucene vs SQL Server's full text search?

ColdFusion 9's full text search is now based on Apache Lucene Solr (or Verity, but it has too much limitations). We also use SQL Server.
Which one's better? Which one's easier?
UPDATE: going to use for... searching against the name & description fields of the Products table.
Thanks!
Here's my 2 cents tested with ~ 3 000 000 of images with captions (primary key + image caption text from 100 to 500 chars):
CF9's Solr implementation is fast in returning results, really easy to setup, fairly fast during building index.
SQL Server 2005 FTS wasn't good enough, tried it some time ago and didn't put it in production. SQL Server 2008 FTS is much better though, currently using it on our application. But basic setup had to be adjusted in order to get high level results.
Based on experiences of other colleagues working with huge data sets and applications mostly based on search and finding things I made my top list:
Lucene
Tuned SQL Server 2008 FTS
Solr
SQL Server 2005
Of course CF9's Solr is winner here if you are chasing fast setup since you need 3 tags to finish the job and get awesome results.
The important question: What are you going to use it for?
Can't pick the right tool for the job when you don't know what the job is ;)

Does the SQL Server 2008 search problem affect SharePoint search?

Does anyone know if the problems that have been affecting Stack Overflow with regards to SQL Server 2008 Full Text Search performance have implications for the search in SharePoint? As far as I understand it SharePoint search uses SQL Server full text search.
SharePoint 2007 has its own search database, to store items such as search scopes and other things.
The actual search index does not use full text search, but stores it's information inside a file based index.
So any search queries run on SharePoint will not cause the issue.
Search crawling of a site is another story, the implementation of which I am not completely sure of. However, most SharePoint sites are not subject to the same transactional throughput that a site such as StackOverflow are hit with.
Morevoer, if a SharePoint site was used to host data as transactional as StackOverflow, very serious performance issues would likely result.
So search in SharePoint 2007 is not going to have the same issue as StackOverflow.
I would not completely rule out some performance hits while a search crawl is running with a SQL 2008 back end, but with decent scheduling and sub 100gig databases, issue should not be noticed by users.
I'm not aware of any problem with SQL Server under 2008, but I'm sure it won't affect SharePoint 2007.
Since 2007, SharePoint search no longer user SQL Search.
If you're running SharePoint 2003, I'm not sure SQL 2008 is supported.

Resources