How to perform Geo Spatial search with django-haystack + solr - solr

I'm currently using django haystack with xapian. I couldn't find any documentation on how to perform geospatial queries on xapian. But there seems to be some momentum on Solr. So i'm currently experimenting with that.
I couldn't get spatialSolr to work properly on local, but for now working with spatial-solr-light, which seems to work fine. It accepts queries like
http://127.0.0.1:8080/solr/select/?q=blahblah&spatial={!radius=1.0%20sort=true}lat:10.0,lng:-10.0
Can anyony point me to a patch for haystack that allows me to pass custom queries like that. I could use raw_search(), but i can't chain the resuts. In any case i would like to find a cleaner way to do something like
sqs.spatial(....)
There are some patches from other people mentioned on the google group(links below), but most of them are unreachable.
References:
https://github.com/fizx/solr-spatial-light
http://groups.google.com/group/django-haystack/browse_thread/thread/d0e23d45c0baa300/2298b6cf43389e18?lnk=gst&q=Spatial#2298b6cf43389e18
http://groups.google.com/group/django-haystack/browse_thread/thread/f88d625679941d77/420892adac151a64
http://groups.google.com/group/django-haystack/browse_thread/thread/e3a70112ce944b00/33bd673fbaaed0a7?lnk=gst&q=jteam#33bd673fbaaed0a7

If you're not tied to Xapian, look at Django, Sphinx and search by distance. I had a similar problem when I ran across this question and this seems to solve it. Thanks to django-sphinx, it's about as easy to set up as Haystack. Sphinx also seems to offer more flexibility.

Here's a fork of django haystack that adds in support for :
https://github.com/sidmitra/django-haystack-spatialsolrplugin
And corresponding notes are here:
https://github.com/sidmitra/django-haystack-spatialsolrplugin/wiki/_pages

Sidmitra, I made port of your solution using haystack 1.2.X and solr 3.4. With some limitations to be frank - no support for schema generation at the moment, only LatLong geo type supported, sorting by distance is not perfect (but works)
https://github.com/frutik/django-haystack/tree/1.2.X

I agree with https://github.com/sidmitra/django-haystack-spatialsolrplugin .
It seems to be out-of-date, but I could beat it into shape with some work. Issues I had:
Hard to find the java SSP and when I found it it was the wrong version. http://www.dutchworks.nl/en/home/download.html was the link that worked for me.
The classpaths in the example xml files I found on the net were all wrong; I had to remove .solrext. from all of them.
The plugin was very picky about which directory it lived in; it couldn't talk to anything else until it was happily in solr/lib
solr_backend.py required the following patch (around line 505):
if self.spatial_query:
final_query = '{{!spatial circles={lat},{long},{radius} }}{0}'.format(final_query,**self.spatial_query)
I had further issues with making the solrconfig.xml so that GeoDistanceComponent never loaded before the query had a valid rsp.
In other words, you can certainly make it work, but you have to be able to deal with a number of error messages in both python and java before you get there.

Related

Scopus DOIs not working in article retrieval APIs

I’m trying to use the Elsapy module to extract the abstracts of documents on certain topics.
I am able to do this but, unfortunately, only for a fraction of the documents found.
For example, a particular search returns 16 documents but I am only able to extract the information (e.g. abstracts) from 4 of them.
Upon further inspection, it seems that for the documents I can’t get the abstracts of:
-Don’t have a PII
-And have DOIs that don’t work.
I have tested the DOIs in the article retrieval interactive API guide
-The ones that returned abstracts worked fine
-The other ones return the error:
RESOURCE_NOT_FOUNDThe resource specified cannot be found.
Even though I have found the original articles and checked their DOI is correct.
An example of one that didn’t work is:
Sengupta, N. K., & Sibley, C. G. (2019). The political attitudes and subjective wellbeing of the one percent. Journal of Happiness Studies, 20(7), 2125-2140. doi:10.1007/s10902-018-0038-4
I have found that the ones that do ‘work’ all have the general form:
10.1016/j.ssmph.2019.100471
10.1016/j.apacoust.2015.03.004
Please let me know if you know why this is and how I can fix it.
Thanks for your help :)
The Article Retrieval API works for Elsevier content hosted on sciencedirect.com; all Elsevier articles have PII identifiers. The example DOI 10.1007/s10902-018-0038-4 does not work because it is published by Springer and, consequently, not available on ScienceDirect.
Kindly note that this is not a bug and everything is working as expected.

Why is it not suggested to implement typeahead using Wildcard search?

Normally a majority of tutorials either suggest implementing autosuggest, either using Suggester component or primitive typehead techniques:
https://blog.griddynamics.com/implementing-autocomplete-with-solr/
However my question is why no one suggests using simple wildcard search for this like for giving name suggestions when user types mob:
q=name:(*mob*)
Is it feasible to use this approach for implementing autosuggest against other approaches?What will be the repercussions?
The strategy can work - for simple queries. The problem is that when you're querying with wildcards, the analysis chain is not invoked (a bit of a simplification - most filters are not invoked, only those that are MultiTermAware) - so as soon as you type a space, you're out of luck. You can work around this with the ComplexPhraseQuery, but that might not be what you're looking for (and can get expensive in regards to the number of terms quickly).
In your example with a leading wildcard, the query will also be very expensive - since it will require Lucene (Solr's underlying search library) to in effect look at each generated token and see if somewhere inside that token there's the text mob. And since you don't have any analysis taking place - if you'd have indexed men's (which would be processed to match just men as a single token in most cases), and searched for men's* - you wouldn't get a hit.
So it works - kind of - but it's not ideal. That's the reason why the suggester was implemented. The suggester component supports many different configuration options to get the behavior you want, as well as (for some backends) context filtering (which would be easier to implement with just a wildcard, since it'd be a regular fq). The suggester also supports weights - while wildcards wouldn't really do that in a proper way.

Any examples of using a Wandsearcher in vespa ? (After a weighted set query)

Currently i am using the REST interface to query vespa, which seems to work great but something tells me that i should be using searchers in the application to make the client(server side code) a bit lighter (bundle the jar file in the application package) to make it a bit smoother. I have managed to do some simple searcher/processor applications. But this is a bit overwhelming.
So are there any readily available examples ?
Basicially i want to:
Send to /search?query=someId
Do a ordinary search for the weighted set on this documentID (I guess this one can be handy: https://docs.vespa.ai/documentation/reference/inspecting-structured-data.html)
Take those items in the response and add it to a wand item(s) and query for a wand with wandsearcher on a given field. Similar to the yql:
"select * from sources * where wand(interest, some weightedsets));","ranking":"combined_score" and return the matches.
Just curious also, apart from the trouble of string building with the http request i am doing at the moment are there any performance gains of using a searcher or go the java route vs rest?
thanks for any insight or code help i can start with.
There is an example of using the WandItem (YQL wand)here https://docs.vespa.ai/documentation/advanced-ranking.html and see also https://docs.vespa.ai/documentation/using-wand-with-vespa.html as there are two wand implementations available in Vespa, it sounds from the description that the wand() is what you want to use for this use case. For the first call you probably want to have a dedicated document summary to reduce the amount of data fetched for your first query and also the option of serving it out of memory only (See https://docs.vespa.ai/documentation/document-summaries.html)
Also see https://docs.vespa.ai/documentation/searcher-development.html as a general resource on writing searchers.
For your use case it makes a lot of sense to write a searcher to perform these two queries as your second query depends on the first and you avoid the cost of rendering/http/yql parsing which might matter if your client is remote with high network latency.

How to use dapper.fluentmap in Dapper?

Does anyone know or have link in how to use https://github.com/henkmollema/Dapper-FluentMap in my Dapper CRUD?. Right now I am using Dapper.Contrib but we are trying to implement Clean architecture which we remove the Dapper.Contrib in our structure. Now I am trying to use this Dapper-FluentMap to map the properties but there documentation is very poor.
I've wrote an article and a sample that shows how to use Dapper-FluentMap:
https://medium.com/dapper-net/custom-columns-mapping-1cd45dfd51d6
After beating my head against a few brick walls, I have established this much as fact (at least as of late 2018, which is after the date of the OP)...
Answering the question "Is FluentMap supposed to work with Dapper.Contrib extensions?", henkmollema (author of Dapper.FluentMap) responds, "Nope, it does not work with Dapper.Contrib".
So there's your answer, user3928241.
However for me as well as for user3928241 and others desperately searching for answers, he adds, "Shameless plug: it does work together with Dommel using the Dapper.FluentMap.Dommel integration component."
YMMV, but I'm pressing on. Going to try Dommel now.

Custom Searcher - Blending of hits from different sources

We have a need for "Blending of hits from different sources", as per your documentation it is recommended to write a custom-searcher in JAVA. Is there a demo of this written somewhere on Github ? I wouldn't even know where to start :( I understand I can create search "chains" , preferably Asynchronous, and then blend results in JAVA before returning them...but then how would I handle paginations, limits...etc ? This all seems very complicated, for someone who doesn't even know JAVA that much. So, I am hoping someone has already written a demo for this ? Please ? Anyone ?
Thank you so much
EDIT to make my quesion clearer:
We are writing a search engine that fetches data from various websites. Some websites have 10mil indexable items, other websites only 100,000. When we present the results to end user, we want to include results from all our sources ( when match applies ). Let's say 10 results from each of the websites we crawl, so that they all get equal amount of attention on page. If we don't do custom blending, what happens is that the largest website with most items wins all our traffic.
I understand that we can send 10 separate queries to VESPA, and blend the results in our front end, but that seems very inefficient. Thus, the quesion of "Custome Searcher". Thank you so much !
That documentation covers some very advanced use cases which you do not have. Are your sources different Vespa schemas or content clusters? If so Vespa will by default blend the hits returned from each according to their relevance scores so there's nothing you need to do.
The two other most common use-cases are:
Some (or all) the data sources are external, so you need to write a Searcher component to fetch the external data and turn it into a Result.
You want the data to be blended in some custom way (rather than by relevance score). If so you need to exclude the default blending Searcher (com.yahoo.prelude.searcher.BlendingSearcher) and write your own.
If you provide some more information about your use cases I can give you some code examples.
EDIT: Use grouping to solve the need explained under "EDIT" in the question:
Create a "siteid" field when feeding (e.g in document processing).
Use the grouping expression all(group(siteid) each(max(10) output(summary())))
See http://docs.vespa.ai/documentation/grouping.html

Resources