This might be a very silly question, but please bear with me and help me out.
I have a basic understanding about what is solr? We have a solr search capability on our website built in coldfusion. I have never worked with searching on websites before. I did look up but I'm not quite clear.
Does it do a web search for the inputted string?
Or does it to a database search of the string?
Thanks
Solr is a search engine, which aggregates data and stores them in an indexed manner, and provides fast lookup. It uses Apache Lucene for indexing.
You could query Solr for a string, and it will return a list of matches, which can then be displayed in your website.
Refer to this presentation for an introduction to Solr.
Note that Solr gives a lot of features to enhance your user experience, i.e Faceted Navigation etc.
Related
It is observed that google does not provide good indexing through its enterprise
search solution Google Search Appliance . But Apache solr has a good indexing capability. Can we use apache solr to index documents and then those documents be
searched through GSA server . So that we can get best of the both world. Kindly give your thoughts ??
Can you please provide more details on why you think the GSA "does not provide good indexing"?
The GSA is generally recognised as being the best or at least one of the best when it comes to result relevancy. When it comes to non-web content, Google supply multiple connectors to allow you to index this content in the GSA and if you have a content source that is neither web based or covered by one of the Google connectors it is not difficult to write your own.
So I'm not sure why you think the indexing is not good, it would be really helpful if you could elaborate.
Mohan is incorrect when he says that you cannot serve Solr content via a GSA, you certainly can do this. What you will need to do is create a onebox module so that you can federate Solr results in realtime and they will be presented to the right of the main GSA results.
What is your data source?
If it is a website crawl,to my little knowledge GSA provides sophisticated crawling/indexing capability for websites than Solr.
Because Solr needs external toolkit such as Tika or Nutch for crawling web resources. On the other hand GSA has its own crawler which makes crawling simple and effective.
Regarding your question on indexing through Solr and serving through GSA,
it is possible through onebox module.(Refer BigMikeW's answer)
If you can provide some information about your data sources, it might help people to suggest the best solution to increase indexing capability in GSA.
Solr provides an easy way to search documents based on keywords, but I was wondering if it had the ability to return the keywords themselves?
For example, I may want to search for all documents created by Joe Blogs last week and then get a feel for the contents of those documents by the keywords inside them. Or do I have to work out the key words myself and save them in a field?
Assuming by keywords you mean the tokens that Solr generates when parsing a particular field, you may want to review the documentation and examples for the Term Vector Component.
Before implementing it though, just checking the Analysis screen of the Solr (4+) Admin WebUI, as it has a section that shows the terms/tokens particular field actually generates.
If these are not quite the keywords that you are trying to produce, you may need to have a separate field that generates those keywords, possibly by using UpdateRequestProcessor in the indexing pipeline.
Finally, if you are trying to get a feel to do some sort of clustering, you may want to look at the Carrot2, which already does this and integrates with Solr.
What you are asking for is know as "Topic Model". Solr does not have out of the box support for this. However there are other tools that you can integrate to achieve this.
Apache Mahout supports LDA algorithm, that can be used to model topics. There are several examples of integrating Solr with Mahout. Here is one such.
Apache UIMA (Unstructured Information Management Applications.) I won't bother typing about it. Instead, here is a brilliant presentation.
I would like to use NLP while indexing the data with Apache Solr.
Identify the synonyms of the words and index that also.
Identify thenamed entity and label it while indexing.
when some one query the Solr Index, I should able to extract the
named entity and intention from the query and form the query string,
so that it can effectively search the indexed file.
Is there any tools / plugins available to satisfy my requirements? I believe it is a common use cases for most of the content based websites. How people handling it?
Here's a tutorial on using Stanford NER with SOLR.
Check out Apache UIMA
Specifically, if you need Solr to do named entity recognition, you can integrate it with UIMA using SolrUIMA
Check out this talk, that demonstrates UIMA + Solr.
I would like to be able to search a CouchDB database using Solr. Are there any projects that provide such an integration?
I am also aware of CouchDB-Lucene. Is there a way to hook Solr into that?
Thanks!
It would make more sense to roll your own, given how wasy it easy. First you need to decide what kind of SOLR schema to use and how to map your CouchDB documents onto that schema. Then simple iterate through all the documents in a db Pagination in CouchDB? and generate SOLR <add> documents.
People do this all the time with all kinds of data sources. Since SOLR is essentially searching a single table, the hard work is often figuring out how to map your database format onto a single table. Read up on what you can do with the SOLR schema, and you may be surprised at how easy this is.
There is a CouchDB integration for ElasticSearch available, apart from feeding ElasticSearch with JSON on your own. Both work with schema-less JSON, so it's very easy to integrate them.
In terms of features, ElasticSearch would offer a comparable set to Solr (in addition to some unique features, of course.)
According to this
http://wiki.apache.org/couchdb/Related_Projects
there was a CouchDB-Solr2 project (scroll down to the end), which is no longer maintained.
Currently collecting information where I should use Nutch with Solr (domain - vertical web search).
Could you suggest me?
Nutch is a framework to build web crawler and search engines. Nutch can do the whole process from collecting the web pages to building the inverted index. It can also push those indexes to Solr.
Solr is mainly a search engine with support for faceted searches and many other neat features. But Solr doesn't fetch the data, you have to feed it.
So maybe the first thing you have to ask in order to choose between the two is whether or not you have the data to be indexed already available (in XML, in a CMS or a database.). In that case, you should probably just use Solr and feed it that data. On the other hand, if you have to fetch the data from the web, you are probably better of with Nutch.