I am learning solr and want to use solr for stemming words.I'll be passing the word to the solr and it should send the stemmed word back.I know how to configure solr core for different stemming patterns and also i am able to view their stemmed words in the analyzer (solr admin ui) but i am not sure how to achieve this using java code.I am able to index and query using java api.
I am using solr-5.3.0.
If you need to just stem the words I would recommend you not to use the whole Solr. Just use the code they use for stemming or something similar. E.g. you can use
org.apache.lucene.analysis.en.PorterStemmer.stem(String)
Unfortunately PorterStemmer has package level access so I would just copy it from the sources or you can search the Internet for some other stemmer implementations. I hope that helps.
Good luck!
Related
Does anybody know how to have Lucene and Solr together in the same Sitecore Instalation?
Sitecore states that is possible here:
https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/search_and_indexing/indexing/using_solr_or_lucene
You can mix Lucene and Solr, and, for example, use Solr for xDB and
Lucene for content search at the same time. If an index is small, it
is much easier to manage as a Lucene index because there is little to
no overhead to set it up.
But there is no reference on how to configure it.
Any advise is welcome.
Cheers!
In words, your analytic indexes will be using SOLR and your content search indexes will be using Lucene.
To configure your analytic indexes to use SOLR, you can check the following documentation from Sitecore: https://doc.sitecore.net/sitecore_experience_platform/setting_up__maintaining/xdb/configuring_servers/configure_a_processing_server#_Solr_configuration
By default, Sitecore already configured Lucene to be used for Content Search. So, for this, there is no change required.
However, I am not sure that SOLR and Lucene can be used for Content Search or xDB at the same time because of its configuration. For example, the Content Search makes use of the index configuration master, web and core. If you decide to use SOLR for Content Search, you will need to disable the Lucene configuration file from the Include folder.
Thanks
ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?
besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.
It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.
If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak
In short, I need to search against my Riak buckets via SOLR. The only problem is, is that by default SOLR searches are case-sensitive. After some digging, I see that I need to write a custom SOLR text analyzer schema. Anyone have any good references for writing search analyzer schemas?
And finally, when installing a new schema for an index, is re-indexing all objects in a bucket necessary to show prior results in a search (using new schema)?
RTFM fail.... I swear though, getting to this page was not easy
http://docs.basho.com/riak/latest/dev/advanced/search-schema/#Defining-a-Schema
When I am searching for a particular content, it is showing the file which has the content, how can I show the line in which the particular content is there?
I know alfresco uses lucene, can I use lucene highlighter. If yes how to use lucene highlighter in alfresco?
What about solr can I use that?
4.2.e without modifications means that you're using SOLR.
Afaik there is no addon that adds hit-highlighting to Alfresco's Solr search subsystem.
It's on the roadmap.
There are quite some posts regarding hit-lighting in Alfresco based on lucene.
Alfresco 5.2 seems to have this feature. Searched for string is highlighted with context in the search results.
Hey Guys,
Im trying to implement some search functionality to an application were writing.
Solr 1.4.1 running on Tomcat7
JDBC connection to a MS SQLServer with the View im indexing
Solr has finished indexing and the index is working.
To search and communicate with Solr i have created a little test WCF service (to be implemented with our main service later).
The purpose is to implement a textfield in our main application. In this text field the users can start typing something like Paintbrush and gradually filter through the list of objects as more and more characters are input.
This is working just fine and dandy with Solr up to a certain point. Im using the Wildcard asterisk in the end of my query and as such im throwing a lot of requests like
p*
pa*
pain*
paint*
etc. at the server and its returning results just fine (quite impressively fast actually). The only problem is that once the user types the whole word the query is paintbrush* at which point solr returns 0 results.
So it seems that query+wildcard can only be query+something and not query+nothing
I managed to get this working under Lucene.Net but Solr isnt doing things the same way it seems.
Any advice you can give me on implementing such a feature?
there isn't much code to look at since im using SolrNet: http://pastebin.com/tXpe4YUe
I figure it has something to do with the Analyzer and Parser but im not yet that into Solr to know where to look :)
I wouldn't implement suggestions with prefix wildcard queries in Solr. There are other mechanisms better suited to do this. See:
Simple Solr schema problem for autocomplete
Solr TermsComponent: Usage of wildcards
Stemming seems to be what caused the problem. I fixed it using a clone of text_ws instead of text for the type.
My changes to scema.xml : http://pastebin.com/xaJZDgY4
Stemming is disabled and lowercase indexing is enabled. As long as all queries are in lower case they should always give results (if there at all).
Issue seems to be that Analyzers dont work with Wildcards, so the logic that would make Johnny the result of Johni or Johnni is "broken" when using wildcards.
If your facing similiar problems and my solution here doesnt quite work you can add debugQuery=on to your query string and see a bit more about whats going on. That helped me narrow down the problem.