Debugging Mongoid and Sunspot SOLR

Debugging Mongoid and Sunspot SOLR - solr

I am fiddling with sunspot and SOLR sunspot, trough sunspot mongoid. Everything seems to work fine, but I am not getting search results back.
The solr admin on http://0.0.0.0:8982/solr/admin/ tells me that there are items indexed, though I have too little knowledge to interprete the exact indexes there. Also, searching through that interface does not give me results either.
I am rather new to SOLR: I have implemented it successfully with a "generic" active-record/mysql Rails app in the past; but not with mongoID.
The problem might be anywhere: not correctly indexed, not correctly retrieved, not correctly passed trough sunspot and so on.
Is it a good idea to start at the solr side first? Throw some requests over HTTP to it, to see if it is actually indexing stuff? If so, how?
Or should I fiddle in rails first, see if it is getting some XML back but parsing or interpreting it wrong?

So I was having the same problem, and I also noticed that it wouldn't reindex my results. Then I found the sunspot_mongo gem. Use it instead of sunspot_mongoid
In your gemfile
gem 'sunspot_mongo'
Then in your models
include Sunspot::Mongo
Then instead of calling search in your controllers do:
Model.solr_search do
fulltext params[:q]
end
Also, to reindex do rake sunspot:mongo:reindex
I think rake sunspot:reindex just tries to index your sqlite database that your probably not using.

Related

Migrate solr standalone index to solrcloud

There are indexes of some solr cores which I convert them from solr4 to solr6 but in solr standalone mode. so they don't have the "version" field that solrcolud require.
Here now I want to migrate to solrcloud 6 and I need to put them under cluster. Because the version field dose not exist there in these indexes when I put them Under a solrcloud leader core on the data directory the replicas in the shard didn't update as I saw. so I decided to read them by lucene, get each doc fields, add them to a solrdoc and then put them doc by doc in solrcloud. But cause there are fields that not stored in these indexes so all fields that exist here in these indexes don't move there.
At the end it seems there is no way for me than re-indexing.
I appreciate if there is any better idea or solutions that can help me migrate more easily.

If there is any chance to reindex, just do so, it's going to be the best in the end (you have to deal with two separate issues: a) migrate from 4.X to 6.0 and b)from standalone to SolrCloud...it's going to be messy).
If you cannot reindex:
are all your fields stored OR have docValues=true? If so, you can get the original contents of your docs. Read them and index them with solrj or with some script.
if not, and you have a version field: try to manually put the index in Solrcloud. Not straighforward, but possible.
if you don't have a version field, I think it is impossible to put the index as is in Solrcloud (although some post on the net make you think it is). You could try to write some lucene code to add version field to all docs (with values that make sense), but this should be the very last resort.

Does SOLR support percolation

ElasticSearch has percolator for prospective search. Does SOLR have a similar feature where you define your query upfront? If not, is there an effective way of implementing this myself on top of the existing SOLR features?

besides what BunkerMentality said, it is not hard to build your own percolator, what you need:
Are the queries you want to run easy to model on Lucene only syntax? if so you are good, if not, you need to convert them to Lucene only. Built them, and keep them in memory as Lucene queries
When a doc arrives:
build a MemoryIndex containing only that single doc
run all your queries on the index
I have done this for a system ingesting millions docs a day and it worked fine.

It's listed as an open new feature, SOLR-4587, on Solr JIRA but it doesn't seem like any work has started on it yet.
There is a link in the comments there to a separate project called Luwak that seems to implement some features similar to percolator.

If it is still relevant, you can use this
It's SOLR Update Processor that based on Luwak

Index my own data in Solr

I am new to Solr and have a couple of questions to ask help from more experienced people:
I am able to get example running, however what is exactly the start.jar?
I know by running "java -jar start.jar", i can start solr. But do i run this command after i index my own data, not the given sample data? if not, what should i do to run my own solr instance with my own indexed data?
I do need to index my own sample data, not related to the given example solr thing at all. How exactly should i do it? Should i copy the example directory then modify the fields in sechema.xml? should i then run the post.sh accordingly to index the data like what i did to set up the example solr?
Thanks a lot for your help!

Steps:
Decide what will be the document structure u store in SOLR. (Somewhat like creating the schema of a relational DB for one table).
remove the example core and create your own core with that schema
once the schema works with no errors (you check the server logs that hosts the SOLR app) You can start feed the data you have into SOLR. You POST it via HTTP in a specific structure which is documented in the SOLR Wiki. Various frameworks have some classes to handle that.
Marked as Wiki as this is too broad an answer for someone who did not bother to RTFM...

Dear custom indexing is not a difficult task as I have worked on it just a few days ago. First you need to write your documnet is xml,csv or json( format supported in solr) containing fields according to your schema.xml, then run following command in example/exampledocs
For a document mydoc.xml
./post.sh mydoc.xml
if in output, status value is 0 then indexing is successful and you can search your document in solr
Reference:http://www.solrtutorial.com/solr-in-5-minutes.html

Though the question is old, but I am writing for new visitors with same issue. The question can't be answered in few words. You must understand what Solr is, whats Solr Admin UI, why we need Solr instead a relational database. Then you can understand how to import sample data. I have recently published two articles i.e. Solr Introduction and Importing Sample Data, these might be helpful for you.
http://www.devtrainings.com/2017/03/apache-solr-introduction-and-server.html
http://www.devtrainings.com/2017/03/apache-solr-index-data-and-run-search.html

Using Solr to read OpenGrok's database and failing with "no segments* file found"

I need a simple way to read OpenGrok's DB from a php script to do some weird searches (as doing that in Java in OpenGrok itself isn't in my abilities). So I decided to use Solr as a way to query the Lucene DB directly from another language (probably PHP or C).
The problem is that when I point Solr to /var/opengrok/data, it bombs out with:
java.lang.RuntimeException: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory#/var/opengrok/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory#3a329572: files: [] at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1103)
(etc, etc, the backtrace is about three screens long)
I tried to point it somewhere inside data with no luck. The structure looks like this:
/var/opengrok/data/index/$projname/segment*
/var/opengrok/data/spelling...
and seems like whatever Solr is using is expecting the segment files directly in the index directory.
I checked to see if there's any version discrepancy, but OpenGrok 0.11 is using Lucene 3.0.2 and I've set Solr to LUCENE_30 as the database version.
Any pointers will be greatly appreciated, google didn't seem to be able to help with this.

opengroks web interface can consume any well formed search query (through url) and reply with xhtml results which are easily parse-able, so you're probably making it too complex to hack inside the lucene rather than using UI provided ...

Solr / SolrNet - Using wildcards for letter by letter search

Hey Guys,
Im trying to implement some search functionality to an application were writing.
Solr 1.4.1 running on Tomcat7
JDBC connection to a MS SQLServer with the View im indexing
Solr has finished indexing and the index is working.
To search and communicate with Solr i have created a little test WCF service (to be implemented with our main service later).
The purpose is to implement a textfield in our main application. In this text field the users can start typing something like Paintbrush and gradually filter through the list of objects as more and more characters are input.
This is working just fine and dandy with Solr up to a certain point. Im using the Wildcard asterisk in the end of my query and as such im throwing a lot of requests like
p*
pa*
pain*
paint*
etc. at the server and its returning results just fine (quite impressively fast actually). The only problem is that once the user types the whole word the query is paintbrush* at which point solr returns 0 results.
So it seems that query+wildcard can only be query+something and not query+nothing
I managed to get this working under Lucene.Net but Solr isnt doing things the same way it seems.
Any advice you can give me on implementing such a feature?
there isn't much code to look at since im using SolrNet: http://pastebin.com/tXpe4YUe
I figure it has something to do with the Analyzer and Parser but im not yet that into Solr to know where to look :)

I wouldn't implement suggestions with prefix wildcard queries in Solr. There are other mechanisms better suited to do this. See:
Simple Solr schema problem for autocomplete
Solr TermsComponent: Usage of wildcards

Stemming seems to be what caused the problem. I fixed it using a clone of text_ws instead of text for the type.
My changes to scema.xml : http://pastebin.com/xaJZDgY4
Stemming is disabled and lowercase indexing is enabled. As long as all queries are in lower case they should always give results (if there at all).
Issue seems to be that Analyzers dont work with Wildcards, so the logic that would make Johnny the result of Johni or Johnni is "broken" when using wildcards.
If your facing similiar problems and my solution here doesnt quite work you can add debugQuery=on to your query string and see a bit more about whats going on. That helped me narrow down the problem.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Debugging Mongoid and Sunspot SOLR - solr

Related

Migrate solr standalone index to solrcloud

Does SOLR support percolation

Index my own data in Solr

Using Solr to read OpenGrok's database and failing with "no segments* file found"

Solr / SolrNet - Using wildcards for letter by letter search

Categories

Resources