Cloudant geo search query with timestamp sorting - cloudant

I have a database of geospatial documents (type: 'Feature' & geometry:...) with a 'date' field as a timestamp (created time).
What is the best way sort the documents by 'date' timestamp?
Looking at the docs - https://console.bluemix.net/docs/services/Cloudant/api/cloudant-geo.html#querying-a-cloudant-geo-index, there is no 'sort' parameter to the .geo() query object. Is the 'in memory' the only way?

You are correct. There's no way to sort when using the geo index. If you want to do that, you can use Lucene geo as described in this blog post, but the spatial part will be limited to bounding box queries.

Related

Migrating SOLR fq to Elasticsearch

I am currently migrating a SOLR app to Elasticsearch and have become stuck on a particular query. The ElasticSearch documentation is rather vague on how to achieve my desired result.
Currently I am trying to convert tagged "fq's" (filter queries) from SOLR into Elasticsearch. I need to be able to return from Elasticsearch facets (now known as aggregations) based on my query and filters but also show aggregations for other options in a search
Although this sounds complicated it is achieved in SOLR simply by adding an "fq" parameter and tagging the filter as follows:
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype
From the main SOLR help docs this will filter on "doctype:pdf" but also include counts for other doc types in the facet output - again this works fine for me, I am simply trying to recreate this in Elasticsearch.
So far I have tried a "post_filter" which does the job until I wish to apply anymore than one filter (again something SOLR handles with no problems). You can see an example of how this works and how I want to achieve it at:
https://www.jobsinhealthcare.co.uk/search?latitude=&longitude=&title=&location=&radius=5&type=&salary=0&frequency=year&since=&jobtype=&keywords=&company=&sort=Most+recent&filter[contract_type_estr][33d5667c]=Temporary&filter[job_type_estr][5d370027]=Part+time&filter[job_type_estr][4b45bd05]=Full+time
IN the filters/facets on the Right of the results you can select multiple "contract type" and/or "job type" and/or "location" and still be shown the facet counts for unselected queries/filters. Please note that Hourly Salary, Annual Salary and Date Added do NOT have this functionality - this is by design.
Any pointers as to how I should be structuring my query would be greatly apprreciated.
I think what you need is global aggregation (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-global-aggregation.html). Inside top level aggregation you should use filter aggregation (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html) as a sub-aggretation to filter only "status:public".

Getting most frequent terms in a subset of indexed lucene documents

Let's assume the following scenario.
Lucene document: ArticleDocument
Fields: {Id, text, publisherId}
A publisher can publish multiple articles.
Problem
I would like to build word clouds (most frequent words, shingles) for each Publisher Id.
After my investigation, I could find ways to get most frequent terms for the entire Index or a document but not for a subset of documents. I found a similar question but that's Lucene 2.x and I'm hoping there exists an effective way in recent Lucene.
Please could you guide me with a way to perform that in Lucene 4.x (preferred) or 3.x (latest in version 3).
Please note that I cannot make each Publisher a document with all the articles being appended to a field.
That's because I would like to have those words in the cloud to be searchable with corresponding articles (by same publisher id) being the results.
I'm not sure whether maintaining two types of lucene documents (article and publisher) is a good idea in terms of maintenance and performance.
Use Pivot Faceting available in Solr 4.X releases. Pivot faceting allows you to facet within the results of the parent facet.
Generate Shingled token for "text" field at indexing time using Shingle Filter Factory.
For faceting add facet=true&facet.pivot=publisherid,text parameters in your query.
Sample query:
http://localhost:8983/solr/collection1/select?q=*:*&wt=json&indent=true&facet=true&facet.pivot=publisherid,text
Query will return frequent shingles/words with frequency for each "publisherid".

Sorting by recent access in Lucene / Solr

In my Solr queries, I want to sort most recently accessed documents to the top ("accessed" meaning opened by user action). No other search criteria has weight for me: of the documents with text matching the query, I want them in order of recent use. I can only think of two ways to do this:
1) Include a 'last accessed' date field in each doc to have Solr sort upon. Trie Date fields can be sorted very quickly, I'm told. The problem of course is keeping the field up to date, which would require storing each document's text so I can delete and re-add any document with an updated 'last accessed' field. Mutable fields would obviate this, but Lucene/Solr still doesn't offer mutable fields.
2) Alternatively, store the mutable 'last accessed' dates and keep them updated in another db. This would require Solr to return the full list of matching documents, which could be upwards of hundreds of thousands of documents. This huge list of document ids would then be matched up against dates in the db and then sorted. It would work OK for uncommon search terms, but not for broad, common search terms.
So the trade off is between 1) index size plus a processing cost every time a document is accessed and 2) big query overhead, especially for unfocused search terms
Do I have any alternatives?
http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-WorkingwithExternalFiles
http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html
You should be able to do this with the atomic update functionality.
http://wiki.apache.org/solr/Atomic_Updates
This functionality is available as of Solr 4.0. It allows you to update a single field in a document without having to reindex the entire document. I only know about this functionality from the documentation. I have not used it myself, so I can't say how well it works or if there are any pitfalls.
Definitely use option 1, using SOLR queries and updating the lastAccessed field as needed.
Since SOLR 4.0 partial document updates are suported in several falvours: https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
For your application it seems that a simple atomic update would be sufficient.
With respect to performance, this should work very well for large collections and fast document updates.

Filter by user specific data but also using a searchengine like Solr

I'm using a relational DB for items, and index them with Solr for getting the fast full-text search that Solr provides. But in the same time, I need the user to be able to filter by item status, that is of course a value particular to this user.
An ItemUserStatus value is an association between: an item, a user and a status, so it's a different table.
So I need to use the searching capabilities of Solr, but need in the same query to filter by user specific information that does not seem indexable to me.
An example query would sound like: get me the items with title "Title" that you have set in the "Pending" state.
I'm not sure what is the best way to do this, or if I'm using the right tools.
Thanks,
Stefan
When designing your Solr schema, you need to denormalize/flatten your data. In this case, it seems that you're searching for "items", so the schema would revolve around items. When populating your Solr index, you'd have a field "Title" and another dynamic field "State", so your query would be as simple as Title:something State_123:Pending where 123 is the user id.
Take a look at RavenDB. It is a document-oriented database built on top of Lucene, so you get a hybrid of Solr and a database, with the exception it is not meant to be a search engine, but rather a full featured doc-database with full-text search support on text fields.
A Linq expression to query RavenDB for your example would then be:
from doc in docs
where doc.Title == "Title" && doc.State == DocState.Pending
select doc

Wildcard to select all items in Solr

I'm currently using Local Solr for doing geo searching. It takes in lat and long parameters as well as a search query. I want to create nearby functionality, where I don't need to provide a location and not a search query. Is there a way to provide a wildcard query that matches all elements then order by the distance? Is the best to create another field and place the same value in all fields?
Thanks.
You can use the query *:* to match all values in all fields.
See http://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F for an example on how to query all documents using the *:* wildcard.
See also http://wiki.apache.org/solr/SolrQuerySyntax for general Solr syntax help.
You may use Solr spatial search to sort by distance, and your query can be *:* if you want to pull all documents from your index.
eg: ?q=*:*&sfield=search_field&pt=22.12,-55.56&sort=geodist() asc
http://wiki.apache.org/solr/SpatialSearch

Resources