Solr / SolrNet - Using wildcards for letter by letter search

Solr / SolrNet - Using wildcards for letter by letter search - sql-server

Hey Guys,
Im trying to implement some search functionality to an application were writing.
Solr 1.4.1 running on Tomcat7
JDBC connection to a MS SQLServer with the View im indexing
Solr has finished indexing and the index is working.
To search and communicate with Solr i have created a little test WCF service (to be implemented with our main service later).
The purpose is to implement a textfield in our main application. In this text field the users can start typing something like Paintbrush and gradually filter through the list of objects as more and more characters are input.
This is working just fine and dandy with Solr up to a certain point. Im using the Wildcard asterisk in the end of my query and as such im throwing a lot of requests like
p*
pa*
pain*
paint*
etc. at the server and its returning results just fine (quite impressively fast actually). The only problem is that once the user types the whole word the query is paintbrush* at which point solr returns 0 results.
So it seems that query+wildcard can only be query+something and not query+nothing
I managed to get this working under Lucene.Net but Solr isnt doing things the same way it seems.
Any advice you can give me on implementing such a feature?
there isn't much code to look at since im using SolrNet: http://pastebin.com/tXpe4YUe
I figure it has something to do with the Analyzer and Parser but im not yet that into Solr to know where to look :)

I wouldn't implement suggestions with prefix wildcard queries in Solr. There are other mechanisms better suited to do this. See:
Simple Solr schema problem for autocomplete
Solr TermsComponent: Usage of wildcards

Stemming seems to be what caused the problem. I fixed it using a clone of text_ws instead of text for the type.
My changes to scema.xml : http://pastebin.com/xaJZDgY4
Stemming is disabled and lowercase indexing is enabled. As long as all queries are in lower case they should always give results (if there at all).
Issue seems to be that Analyzers dont work with Wildcards, so the logic that would make Johnny the result of Johni or Johnni is "broken" when using wildcards.
If your facing similiar problems and my solution here doesnt quite work you can add debugQuery=on to your query string and see a bit more about whats going on. That helped me narrow down the problem.

Related

Does Hybris handle SOLR spellcheck feature by its own?

With Hybris commerce 6.7 which does have Solr 7.7, I'm finding it very difficult to configure Solr appropriately to meet business expectation while showing them "Did you mean" suggestion. I searched many articles regarding this and found many configuration parameters. Based on all those, with meaningful changes, I was expecting to have any working for me. Unfortunately, I'm still in search of that particular configuration or approach that retailers like Flipkart or Amazon is handling it. Below are the points that troubled me a lot.
To my knowledge Spellcheck works per word from entire search phrase. If user searches with single word but does have understandable spelling mistake, Solr is able to find the correct word easily. E.g. Telvison, mobbile etc.
If user searches multi-word (phrase), for some instances, Hybris Solr is not able to bring any suggestion. Sometimes, it shows suggestions with no real-world existence. E.g. If you misspelled aple watch, it gives suggestion apple water. For bode speaker, it suggests body speaker. For water heatre, it suggests skywater theatre. For samsung note 10 lite, it suggests samsung note 10 litre. For red apple wath, red apple with is getting suggested. For red apple watch, it shows led apple watch. And there are many. Isn't it ridiculous?
Tried adding WordBreakSolrSpellChecker dictionary with existing DirectSolrSpellChecker, it didn't impact the suggestion. I doubt if Hybris does allow this.
I also tried FileBasedSpellChecker dictionary to maintain a separate text file, but it seems like Hybris does have hard dependency that doesn't allow such changes.
Changed the dictionary to IndexBasedSpellChecker, but it threw exception in Solr admin console.
After playing with collate parameters, I figured out that Solr is giving me suggestions that don't have any direct search result (product). FYI, I used phrase search (not freetext) in my search implementation.
There are many parameters that standalone Solr does offer to us. I studied and implemented those, but I remained helpless although conceptually those should work.
Can anyone please guide me how I should proceed? If you want, I can share my Solr configuration.

Solr : Stemming words Using Solr

I am learning solr and want to use solr for stemming words.I'll be passing the word to the solr and it should send the stemmed word back.I know how to configure solr core for different stemming patterns and also i am able to view their stemmed words in the analyzer (solr admin ui) but i am not sure how to achieve this using java code.I am able to index and query using java api.
I am using solr-5.3.0.

If you need to just stem the words I would recommend you not to use the whole Solr. Just use the code they use for stemming or something similar. E.g. you can use
org.apache.lucene.analysis.en.PorterStemmer.stem(String)
Unfortunately PorterStemmer has package level access so I would just copy it from the sources or you can search the Internet for some other stemmer implementations. I hope that helps.
Good luck!

Does SOLR support percolation

ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?

besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.

It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.

If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak

Using Solr to read OpenGrok's database and failing with "no segments* file found"

I need a simple way to read OpenGrok's DB from a php script to do some weird searches (as doing that in Java in OpenGrok itself isn't in my abilities). So I decided to use Solr as a way to query the Lucene DB directly from another language (probably PHP or C).
The problem is that when I point Solr to /var/opengrok/data, it bombs out with:
java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory#/var/opengrok/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory#3a329572: files: [] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103)
(etc, etc, the backtrace is about three screens long)
I tried to point it somewhere inside data with no luck. The structure looks like this:
/var/opengrok/data/index/$projname/segment*
/var/opengrok/data/spelling...
and seems like whatever Solr is using is expecting the segment files directly in the index directory.
I checked to see if there's any version discrepancy, but OpenGrok 0.11 is using Lucene 3.0.2 and I've set Solr to LUCENE_30 as the database version.
Any pointers will be greatly appreciated, google didn't seem to be able to help with this.

opengroks web interface can consume any well formed search query (through url) and reply with xhtml results which are easily parse-able, so you're probably making it too complex to hack inside the lucene rather than using UI provided ...

Debugging Mongoid and Sunspot SOLR

I am fiddling with sunspot and SOLR sunspot, trough sunspot mongoid. Everything seems to work fine, but I am not getting search results back.
The solr admin on http://0.0.0.0:8982/solr/admin/ tells me that there are items indexed, though I have too little knowledge to interprete the exact indexes there. Also, searching through that interface does not give me results either.
I am rather new to SOLR: I have implemented it successfully with a "generic" active-record/mysql Rails app in the past; but not with mongoID.
The problem might be anywhere: not correctly indexed, not correctly retrieved, not correctly passed trough sunspot and so on.
Is it a good idea to start at the solr side first? Throw some requests over HTTP to it, to see if it is actually indexing stuff? If so, how?
Or should I fiddle in rails first, see if it is getting some XML back but parsing or interpreting it wrong?

So I was having the same problem, and I also noticed that it wouldn't reindex my results. Then I found the sunspot_mongo gem. Use it instead of sunspot_mongoid
In your gemfile
gem 'sunspot_mongo'
Then in your models
include Sunspot::Mongo
Then instead of calling search in your controllers do:
Model.solr_search do
fulltext params[:q]
end
Also, to reindex do rake sunspot:mongo:reindex
I think rake sunspot:reindex just tries to index your sqlite database that your probably not using.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight