How i query monogid in active record rails console as to iterate like skipping over a range of records on every iteration - mongoid

As ruby has function step(n) which helps to set increments while iteration, so when using gem 'mongoid'. How would we do the same since monogid doesn't seem to support step(n).Here's the link for Mongoid documentationHere's the link for Step function

The Mongoid site docs seem to have left the paging methods out but the embedded documentation covers skip:
Skips the provided number of documents.
and limit:
Limits the number of documents that are returned from the database.
The Origin docs (which is what Mongoid uses for building queries) also cover skip and limit but you'd need to know where to look or what you're looking for.

Related

How to perform an exact search in Solr

I implementing Solr search using an API. When I call it using the parameters as, "Chillout Lounge", it returns me the collection which are same/similar to the string "Chillout Lounge".
But when I search for "Chillout Lounge Box", it returns me results which don't have any of these three words.(in the DB there are values which have these 3 values, but they are not returned.)
According to me, Solr uses Fuzzy search, but when it is done it should return me some values, which will have at least one these value.
Or what could be the possible changes I should to my schema.XML, such that is would give me proper values.
First of all - "Fuzzy search" is a feature you'll have to ask for (by using ~ in standard Lucene query syntax).
If you're talking about regular searches, you can use q.op to select which operator to use. q.op=AND will make sure that all the terms match, while q.op=OR will make any document that contain at least one of the terms be returned. As long as you aren't using fq for this, the documents that match more terms should be scored higher (as the score will add up across multiple terms), and thus, be shown higher in the result set.
You can use the debug query feature in the web interface to see scores for each term for a document, and find out why the document was returned at all. If the document doesn't match any terms, it shouldn't be returned, unless you're asking for all documents to be returned.
Be aware that the analyzer chain defined for the field you're searching might affect what's considered a match and not.
You'll have to add a proper example to get a more detailed answer.

Displaying information about SolR search result

I am struggling with a little problem where I have to display relevant information about the resultset returned from SolR but can't figure out how to calculate it without iterating the results (bad).
Basically I am storing my documents with a state field and while the search is supposed to return all documents, the UI has to show "Found 15 entities, 5 are in state A, 3 in state B and 8 in C".
At the moment I am using a rather brittle approach of running the query 3 times with additional scoping by type, but I'd rather get that information from the one query I am displaying. (There have been some edge cases where the numbers don't add up and since SolR can return facets I guess there has to be a way to use that functionality in this case)
I am using SolR 3.5 from Rails with the sunspot gem
As you mention yourself, you can use facets for this by setting
facet=true&facet.field=state
I'm not familiar with the sunspot gem, but by looking at the documentation you can use
facets like this(Assuming Entity is your searchable):
Entity.search do:
facet :state
end
This should return the states of all entities returned by your query with the number of entities in this state. The Sunspot documentation tells me you can read these facets in the following way:
search.facet(:state).rows.each do |facet|
puts "State #{facet.value} has #{facet.count} entities"
end
Essentially there are three main sets of functions you can use to garner stats from solr.
The first is faceting:
http://wiki.apache.org/solr/SimpleFacetParameters
There is also grouping (field collapsing):
https://wiki.apache.org/solr/FieldCollapsing
And the stats package:
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Although the stats, facet and group may be replaced by the analytic package known as olap which is aimed to be in solr V 5.0.0:
https://issues.apache.org/jira/browse/SOLR-5302
Good luck.

Searching wiki URLs using Solr

I am trying to index and search a wiki on our intranet using Solr. I have it more-or-less working using edismax but I'm having trouble getting main topic pages to show up first in the search results. For example, suppose I have some URLs in the database:
http://whizbang.com/wiki/Foo/Bar
http://whizbang.com/wiki/Foo/Bar/One
http://whizbang.com/wiki/Foo/Bar/Two
http://whizbang.com/wiki/Foo/Bar/Two/Two_point_one
I would like to be able to search for "foo bar" and have the first link returned as the top result because it is the main page for that particular topic in the wiki. I've tried boosting the title and URL field in the search but the fieldNorm value for the document keeps affecting the scores such that sub-pages score higher. In one particular case, the main topic page shows up on the 2nd results page.
Is there a way to make the First URL score significantly higher than the sub categories so that it shows up in the top-5 search results?
One possible approach to try:
Create a copyField with your url
Extract path only (so, no host, no wiki)
Split on / and maybe space
Lowercase
Boost on phrase or bigram or something similar.
If you have a lot of levels, maybe you want a multivalued field, with different depth (starting from the end) getting separate entries. That way a perfect match will get better value. Here, you should start experimenting with your real searches.

Solr: Number of posted files does not equal maxDoc

I apologize in advance if this question has already been answered somewhere - I wasn't able to find it.
I'm relatively new to Solr, and have been following the instructions given by the tutorial for using the default SimplePostTool to index my data from the command line. I'm currently using Solr 4.0 in my testing.
First, I delete everything in my index by query. Then I point the SimplePostTool to several directories and index tens of thousands of files. In my case, for right now, each XML file is a separate document. Some of the documents may have the same uniqueKey ID. If it matters, the XML document sizes range from 4-60kB.
SimplePostTool returns when it's finished and says 26,541 files were indexed. Then I look in the Admin collection1 page and see Num Docs = 20,985 and Max Doc = 22,921.
I've seen other posts discussing the discrepancy between Num Docs and Max Doc (I feel I understand that overwrite behavior sufficiently). My question is why the number of indexed docs reported by SimplePostTool does not match Max Doc given by the Solr Admin page?
The reason you have different number of numDocs and maxDoc:
numDocs represents the number of searchable documents in the index (and will be larger than the number of XML files since some files contained more than one ). maxDoc may be larger as the maxDoc count includes logically deleted documents that have not yet been removed from the index. You can re-post the sample XML files over and over again as much as you want and numDocs will never increase, because the new documents will constantly be replacing the old.
From:Solr official Tutorial . This apply for older versions.
You could remove logically deleted files by optimizing your index ->

Solr get calculated distance while using dismax

I'm starting to think that what I want to do is not possible but thought I would give this a try.
I'm running Solr 3.5.
I currently have two types of search:
A basic spatial query which returns the calulated distance between two points in the score field.
Sample Query from my Solr logs:
?fl=*,score&sort=score+asc&start=0&q={!func}geodist()&sfield=coordinates&pt=59.2363514,18.092783&version=2
A dismax query which allows free text queries on a number of fields.
Sample Query from Solr log:
mm=1&d=100.0&sfield=coordinates&qf=field1^5.0+fields2^3.0&defType=edismax&version=2&fl=*,score&start=1&q=monkeyhopper&pt=59.2363514,18.0927830000&fq={!geofilt}}
I want to replace my first query with the dismax query but I really need to get the calculated distance in the response. Yes, I can calulate the distance programatically but I would prefer not having to do this as Solr has done it for me already.
I still want to be able to sort my dismax query "by relevance", distance or any other field so the score given by my boosts could be interesting for sorting but I don't need it to be returned.
If I understood correctly you want to have the result of a function in your Solr response. The SOLR-2444 issue is what you're looking for I guess: it allows to include in the fl parameter pseudo-fields, functions etc. The only problem is that it's been committed only on trunk, so it isn't available on the current Solr release, neither will be in the coming 3.6 release. You have to wait for the 4 release but I don't think it will take a lot of time. Maybe you can already start playing around with a snapshot of the last successful Jenkins build.
Pseudo-fields are now available in Solr 4+ which allow you to do just this.
http://localhost:8983/solr/collection1/browse?q=*:*&rows=1000&wt=xml&pt=37.763649,-122.24313&sfield=store&fl=dist:geodist()
For instance, this request allows me to return a field "dist" which contains the distance of each entry to the stated point.

Resources