solr spatial search with distance to search results

solr spatial search with distance to search results - solr

I'm able to return all results within a specific radius from geolocation point A, but I want to return the distance of each search result to point A.
I was reading this: http://wiki.apache.org/solr/SpatialSearch
I have this Solr query:
http://localhost:8983/solr/tt/select/?indent=on&facet=true&fq={!geofilt}&pt=51.4416420,5.4697225&sfield=geolocation&d=20&sort=geodist()%20asc&q=*:*&start=0&rows=10&fl=_dist_:geodist(),id,title,lat,lng,geolocation,location&facet.mincount=1
And this in my schema.xml
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<field name="geolocation" type="location" indexed="true" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
This is one of the results:
<doc>
<str name="geolocation">51.4231086,5.474830699999984</str>
<str name="id">122</str>
<str name="lat">51.4231086</str>
<str name="lng">5.474830699999984</str>
<str name="title">Eindhoven Museum</str>
</doc>
However, with my current query string, I don't see a distance field in the document.
What am I missing?

Related

solr set relevancy score in solrconfig

Im using solr 4.4 ,I want to search by relevancy for exact match words .I have 10 fields ,i used
copy fields to achieve this.And pretty much its working fine.
Im having problem with the exact match results should be higher the order.
Also how i can set score?
schema.xml
<field name="field8" type="text_search" indexed="true" stored="true"/>
<field name="description" type="text_search" indexed="true" stored="true"/>
<field name="keywords" type="text_search" indexed="true" stored="true"/>
<copyField source="field8" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
solrconfig.xml
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="qf">
field8 description keyword ^10.0
</str>
<str name="df">text</str>
<str name="mm">100%</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
........
........
........

Phrase Fields pf
Once the list of matching documents has been identified using the fq
and qf parameters, the pf parameter can be used to "boost" the score
of documents in cases where all of the terms in the q parameter appear
in close proximity.
For Example if you search for Apache Solr Lucene by setting pf to the title
q=Apache Solr Lucen
& qf=title name
& pf=title
<!--Debug-->
<str name="parsedquery_toString">
+((name:apache | title:apache) (name:solr | title:solr) (name:lucene | title:lucene)) (title:"apache solr lucene")
</str>
Now If you look at the debug response.It is searching for the single Keyword but also searching it as phrase. So it boost all the search results which have the search String as phrase.
P.S :- Again pf will only impact boost score not the search results.

Solr Deduplication (dedupe) giving all zeros in signatureField

I've followed the examples listed in the documentation here: http://wiki.apache.org/solr/Deduplication and https://cwiki.apache.org/confluence/display/solr/De-Duplication
However, when analyzing the results every signatureField gets returned like so:
0000000000000000
I can't seem to figure out why a unique signature isn't being generated.
Relevant config sections:
solrconfig.xml
<requestHandler name="/update"
class="solr.XmlUpdateRequestHandler">
<!-- See below for information on defining
updateRequestProcessorChains that can be used by name
on each Update Request
-->
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
...
<!-- Deduplication
An example dedup update processor that creates the "id" field
on the fly based on the hash code of some other fields. This
example has overwriteDupes set to false since we are using the
id field as the signatureField and Solr will maintain
uniqueness based on that anyway.
-->
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">signatureField</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">name,features,cat</str>
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
schema.xml
<fields>
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from the
<types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory). Only full-text
fields or fields that need an index-time boost need norms.
Norms are omitted for primitive (non-analyzed) types by default.
termVectors: [false] set to true to store the term vector for a
given field.
When using MoreLikeThis, fields used for similarity should be
stored for best performance.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
will increase storage costs.
default: a value that should be used if no value is specified
when adding a document.
-->
<field name="signatureField" type="string" stored="true" indexed="true" multiValued="false" />
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="sku" type="text_en_splitting_tight" indexed="true" stored="true" omitNorms="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/>
<field name="manu" type="text_general" indexed="true" stored="true" omitNorms="true"/>
<field name="cat" type="string" indexed="true" stored="true" multiValued="true"/>
<field name="features" type="text_general" indexed="true" stored="true" multiValued="true"/>
... etc
I'm wondering if anyone can steer me in the right direction?

Query multiple collections with different fields in solr

Given the following (single core) query's:
http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json
The first query returns "numFound":40000"
The second query returns "numFound":10000"
I tried putting these together by:
http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json
Now I get "numFound":50000".
The only problem is "a" has more columns than "b". So the multiple collections request only returns the values of a.
Is it possible to query multiple collections with different fields? Or do they have to be the same? And how should I change my third url to get this result?

What you need is - what I call - a unification core. That schema itself will have no content, it is only used as a sort of wrapper to unify those fields you want to display from both cores. In there you will need
a schema.xml that wraps up all the fields that you want to have in your unified result
a query handler that combines the two different cores for you
An important restriction beforehand taken from the Solr Wiki page about DistributedSearch
Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) The unique key field must be unique across all shards. If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic.
As example, I have shard-1 with the fields id, title, description and shard-2 with the fields id, title, abstractText. So I have these schemas
schema of shard-1
<schema name="shard-1" version="1.5">
<fields>
<field name="id"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="title"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="description"
type="text" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- type definition left out, have a look in github -->
</schema>
schema of shard-2
<schema name="shard-2" version="1.5">
<fields>
<field name="id"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="title"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="abstractText"
type="text" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- type definition left out, have a look in github -->
</schema>
To unify these schemas I create a third schema that I call shard-unification, which contains all four fields.
<schema name="shard-unification" version="1.5">
<fields>
<field name="id"
type="int" indexed="true" stored="true" multiValued="false" />
<field name="title"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="abstractText"
type="text" indexed="true" stored="true" multiValued="false" />
<field name="description"
type="text" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- type definition left out, have a look in github -->
</schema>
Now I need to make use of this combined schema, so I create a query handler in the solrconfig.xml of the solr-unification core
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="qf">id title description abstractText</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
</lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />
That's it. Now some index-data is required in shard-1 and shard-2. To query for a unified result, just query shard-unification with appropriate shards param.
http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2
This will return you a result like
{
"responseHeader":{
"status":0,
"QTime":10},
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
{
"id":1,
"title":"title 1",
"description":"description 1",
"score":1.0},
{
"id":2,
"title":"title 2",
"abstractText":"abstract 2",
"score":1.0}]
}}
Fetch the origin shard of a document
If you want to fetch the originating shard into each document, you just need to specify [shard] within fl. Either as parameter with the query or within the requesthandler's defaults, see below. The brackets are mandatory, they will also be in the resulting response.
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="qf">id title description abstractText</str>
<str name="fl">*,score,[shard]</str>
<str name="mm">100%</str>
</lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />
Working Sample
If you want to see a running example, checkout my solrsample project on github and execute the ShardUnificationTest. I have also included the shard-fetching by now.

Shards should be used in Solr
When an index becomes too large to fit on a single system, or when a single query takes too long to execute
so the number and names of the columns should always be the same. This is specified in this document (where the previous quote also come from):
http://wiki.apache.org/solr/DistributedSearch
If you leave your query as it is and make the two shards with the same fields this shoudl just work as expected.
If you want more info about how the shards work in SolrCould have a look at this docuemtn also:
http://wiki.apache.org/solr/SolrCloud

How to avoid splitting of field values in faceted search in solr

While facet-based searching, in the search result doc element has field with values in the form of string(of more than words) but in the facet, every value is in the form of string with single word.
Following is the sample solr search result,
<result>
<doc>
<str name="fieldA">abc1 efg1 ijk1</str>
<str name="fieldA">abc2 efg2 ijk2</str>
<str name="fieldA">abc3 efg3 ijk3</str>
<arr name="fieldD">
<str>abc1 efg1 ijk1</str>
<str>abc2 efg2 ijk2</str>
<str>abc3 efg3 ijk3</str>
</arr>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries">
<int name="fieldB:ab">some_number</int>
</lst>
<lst name="facet_fields">
<lst name="fieldA">
<int name="abc1">1</int> I want <int name="abc1 efg1 ijk1">1</int>
<int name="efg1">1</int>
<int name="ijk1">1</int>
</lst>
</lst>
</lst>
Schema.xml has fields - fieldA, fieldB, fieldC and fieldD like following
<field name="fieldA" type="text_general" stored="true" indexed="true"/>
<field name="fieldB" type="text_general" stored="true" indexed="true"/>
<field name="fieldC" type="text_general" stored="true" indexed="true"/>
<field name="fieldD" type="text_general" stored="true" indexed="true"/>
and
<copyField source="fieldA" dest="fieldD"/>
<copyField source="fieldB" dest="fieldD"/>
<copyField source="fieldC" dest="fieldD"/>
I want the facet values of string of multiple words just like in the string of multiple words in the field values. Please suggest.

You have to change the type of your field from type="text_general" into type="string" for the facet search.
If you can't do it for that field you can create a new string field (it could be a copyfield) and then apply the facet on that one.

Solr schema field

I've made a schema for solr and I don't know the name of every field from the document I want to add, so I defined a dynamicField like this:
<dynamicField name="*" type="text_general" indexed="true" stored="true" />
Right now I'm testing and I don't get an error when importing for undefined fields in the document, but when I try to query for *:something (anything other than "*") I don't get any results back.
My question is how can I define a catch all field, is there any right way to do this? Or am I under the wrong impression that a query for *:something would normally search in all the documents and all the fields for "something"?

The search key word `*:something` can not get anything from solr, no matter what kind of field you are using, dinamicField or not.
If I understand your question correctly, you want a dynamicField to store all fields and want to query all fields laterly.
Here is my solution.
First, defining a default_search field for search:
<field name="default_search" type="text" indexed="true" stored="true" multiValued="true"/>
And then copy all fields into the default_search field.
<copyField source="*" dest="default_search" />
Finally, you can make a query for all fields like this:
http://host/core/select/?q=something
or
http://host/core/select/?q=default_search:something

AFAIK *:something does not query all the fields. It looks for a field names *.
I get the below error when attempting to do a query for *:test
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">9</int>
<lst name="params">
<str name="wt">xml</str>
<str name="q">*:test</str>
</lst>
</lst>
<lst name="error">
<str name="msg">undefined field *</str>
<int name="code">400</int>
</lst>
</response>
You would need to define a catchall field using copyField in your schema.xml.
I would recommend not using a simple wildcard for dynamic fields. Instead something like this:
<dynamicField name="*_text" type="text_general" indexed="true" stored="true" />
and then have a catchall field
<field name="CatchAll" type="text_general" indexed="true" stored="true" multiValued="false" />
You can have a copyField defined as below, to support query such as q=something
<copyField source="*_text" dest="CatchAll" />