I am rewriting our company's search functionality to use Solr instead of Compass. Our old code is using CompassQueryBuilder.CompassQueryStringBuilder to build a query out of a list of keywords. The keywords may have spaces in them: for example: "john smith", "tom jones".
Is there an existing facility I can use in Solr to replicate this functionality?
The closest thing I know for SolrJ is the solrj-criteria project. It seems to be currently unmaintained though.
Solr offers a wide variety of querying and indexing options. So fields that contain keywords with spaces in it, can be made possible by defining a custom type in the configuration file (see here). Queries with spaced keywords in it can be made possible by specifying a custom QueryParser. (see here)
Solr itself doesn't offer a QueryStringBuilder in an API. Actually, Solr itself doesn't offer any API classes at all, since all interaction is done by posting messages over Http. There are client libraries for Java, .NET and PHP etc. In the SolrNet api there exists a SolrMultipleCriteriaQuery, which is quite similar to the CompassQueryStringBuilder.
Related
I am working on implementing a research web application or portal that integrates different research portal or website using an open source platform called search kit. The web application will act as a central point of access to research publications on different research portals. To do this, I also need to implement a third party system that does the following:
Searches for documents based on user query on the other different research portals and presents or displays the results to the users on my web application.
Index the documents
Should be used by system administrators to configure the web application. Whereby system administrators can add,remove or modify the URL of the website Solr is pulling documents from
Displays the results to the user in one standard format.
My question is, can apache solr be used to implement the third party system? if not, what open source platform or way would you recommend I used to implement the third party system?
In general, Solr seems like a good fit here, but you might need some custom code (apart from configuration) here and there. To go through the points:
Querying is one of the main features of Solr, so this is definitely possible.
Indexing is handled by Solr.
There was a component for Solr called "Data Import Handler" that supported indexing from URLs (see the docs). However, this was removed from the main Solr distribution, and was moved to a separate package. This package doesn't seem to be actively maintained though, so you will probably run into some problems if you decide to use it. The alternative is to develop your document-pulling code yourself.
Solr can display the results in multiple formats, but it still might not support the exact format you would like it to be. In this case, you need to build your transformation based on the result from Solr.
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
With ElasticSearch, an app can point to the alias of an index, instead of the index directly, which makes it easy to switch the index the app uses.
Tire, the equivalent of Sunspot for ES, allows me to interact with aliases.
I can't find anything regarding aliases with Sunspot. How do you handle them in your apps which use Sunspot?
I do not know anything about sunspot, but for Solr counts that there has been a core alias feature, until version 3.1 of Solr. This has been removed with SOLR-1637 and has been "really, really" removed with SOLR-6169 in version 4.9.
But with the advent of SolrCloud this feature has been re-introduced with a better/different implementation SOLR-4497 in Solr 4.2.
Unfortunately when skimming through the Reference of Sunspost I do not find a word about SolrCloud or aliasing. Probably that features have not been adopted by the Sunspot developers? As stated I do not know sunspot, probably they name it differently?
Most likely you will have to get your hands dirty and manage SolrCloud and in consequence aliases not through the API sunspot offers, but with admin interface of Solr.
Sources of information
There is this old Wiki page that covers SolrCloud. It has a small, separate section about creating aliases
In the official reference is also a section about collection aliases.
The guys of Cloudera who have donated the feature to Solr have also written a blog post about it.
I think the title is self-explanatory.
I don't see anything on the Apache Solr wiki that suggests you can maintain the schema of an Apache Solr instance using the ReST API, but maybe (hopefully) you know something I don't.
I just found a section on the Solr wiki where they describe this exact feature for release 4.4 (which is not released yet).
It does have some prerequisite configuration on the Solr instance, but it does allow you to add fields to the schema. Based on that information, I can't see why they won't eventually extend the functionality to allow you to delete as well. I guess we will have to wait and see.
Here is the link to that section: http://wiki.apache.org/solr/SchemaRESTAPI#Adding_fields_to_a_schema. It also references this JIRA issue: "In preparation for dynamic schema modification via REST API, add a "managed" schema facility".
I am currently using Apache Solr to build a search engine. The queries in Solr are of the field:value format. Now I want to use a part-of-speech tagger to separate the subject, verb and predicate and search the values in each fields. For example, if I input "Who likes Starbucks" then I need some code to give me "q=subject:*&verb=likes&object=starbucks". Is there any library that can handle this job? Thank you!
I think several people have used UIMA for this, see solr wiki
There are a number of POS taggers. Here is another StackOverflow posting about this: What is a good Java library for Parts-Of-Speech tagging?