How to avoid splitting of field values in faceted search in solr - solr

While facet-based searching, in the search result doc element has field with values in the form of string(of more than words) but in the facet, every value is in the form of string with single word.
Following is the sample solr search result,
<result>
<doc>
<str name="fieldA">abc1 efg1 ijk1</str>
<str name="fieldA">abc2 efg2 ijk2</str>
<str name="fieldA">abc3 efg3 ijk3</str>
<arr name="fieldD">
<str>abc1 efg1 ijk1</str>
<str>abc2 efg2 ijk2</str>
<str>abc3 efg3 ijk3</str>
</arr>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries">
<int name="fieldB:ab">some_number</int>
</lst>
<lst name="facet_fields">
<lst name="fieldA">
<int name="abc1">1</int> I want <int name="abc1 efg1 ijk1">1</int>
<int name="efg1">1</int>
<int name="ijk1">1</int>
</lst>
</lst>
</lst>
Schema.xml has fields - fieldA, fieldB, fieldC and fieldD like following
<field name="fieldA" type="text_general" stored="true" indexed="true"/>
<field name="fieldB" type="text_general" stored="true" indexed="true"/>
<field name="fieldC" type="text_general" stored="true" indexed="true"/>
<field name="fieldD" type="text_general" stored="true" indexed="true"/>
and
<copyField source="fieldA" dest="fieldD"/>
<copyField source="fieldB" dest="fieldD"/>
<copyField source="fieldC" dest="fieldD"/>
I want the facet values of string of multiple words just like in the string of multiple words in the field values. Please suggest.

You have to change the type of your field from type="text_general" into type="string" for the facet search.
If you can't do it for that field you can create a new string field (it could be a copyfield) and then apply the facet on that one.

Related

Assign Unique id's across all documents and its children

So the situation is as follows:
Solr has a dataimport directly on the database
I have a table project in a relationship to unit. A project can hold up to 5 units
ID's are automatically generated from the database, starting by 1
ID's are unique for each table but not across the database
Since Solr requires each document to have a unique ID I created a field solrId which gets its ID's from solr.UUIDUpdateProcessorFactory.
However, the dataimport only fetches a few projects and no units whatsoever. Can someone point me in the right direction?
The relevant passages:
solrconfig.xml:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">solrId</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
....
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">wiensued-data-config.xml</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
managed-schema:
<uniqueKey>solrId</uniqueKey>
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<!-- solrId is the real ID -->
<field name="solrId" type="uuid" multiValued="false" indexed="true" stored="true" />
<!-- the ID from the database -->
<field name="id" type="int" multiValued="false" indexed="true" stored="true"/>
The dataimporthandler is configured to index id (from the table) into either projectId or unitId
The stacktrace is:
org.apache.solr.common.SolrException: [doc=null] missing required field: solrId
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:265)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:107)
at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:212)
at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:185)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:259)
at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:433)
at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1384)
at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:920)
at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913)
at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:91)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:254)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:526)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
at java.lang.Thread.run(Thread.java:748)
However, the solrId is provided as far as I can tell
just get this fixed in your dih config, it will be just cleaner and easier.
Just prepend a 'p' to the project id to create the id, and supply that to solr. Likewise with the units (prepend 'u'). You get the idea:
<entity name="project" pk="id" query="select concat('p', id) as solrid, ...
Of course the sql depends on your DB.

solr set relevancy score in solrconfig

Im using solr 4.4 ,I want to search by relevancy for exact match words .I have 10 fields ,i used
copy fields to achieve this.And pretty much its working fine.
Im having problem with the exact match results should be higher the order.
Also how i can set score?
schema.xml
<field name="field8" type="text_search" indexed="true" stored="true"/>
<field name="description" type="text_search" indexed="true" stored="true"/>
<field name="keywords" type="text_search" indexed="true" stored="true"/>
<copyField source="field8" dest="text"/>
<copyField source="description" dest="text"/>
<copyField source="keywords" dest="text"/>
solrconfig.xml
<requestHandler name="/browse" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<!-- Query settings -->
<str name="defType">edismax</str>
<str name="qf">
field8 description keyword ^10.0
</str>
<str name="df">text</str>
<str name="mm">100%</str>
<str name="q.alt">*:*</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
........
........
........
Phrase Fields pf
Once the list of matching documents has been identified using the fq
and qf parameters, the pf parameter can be used to "boost" the score
of documents in cases where all of the terms in the q parameter appear
in close proximity.
For Example if you search for Apache Solr Lucene by setting pf to the title
q=Apache Solr Lucen
& qf=title name
& pf=title
<!--Debug-->
<str name="parsedquery_toString">
+((name:apache | title:apache) (name:solr | title:solr) (name:lucene | title:lucene)) (title:"apache solr lucene")
</str>
Now If you look at the debug response.It is searching for the single Keyword but also searching it as phrase. So it boost all the search results which have the search String as phrase.
P.S :- Again pf will only impact boost score not the search results.

Solr schema field

I've made a schema for solr and I don't know the name of every field from the document I want to add, so I defined a dynamicField like this:
<dynamicField name="*" type="text_general" indexed="true" stored="true" />
Right now I'm testing and I don't get an error when importing for undefined fields in the document, but when I try to query for *:something (anything other than "*") I don't get any results back.
My question is how can I define a catch all field, is there any right way to do this? Or am I under the wrong impression that a query for *:something would normally search in all the documents and all the fields for "something"?
The search key word `*:something` can not get anything from solr, no matter what kind of field you are using, dinamicField or not.
If I understand your question correctly, you want a dynamicField to store all fields and want to query all fields laterly.
Here is my solution.
First, defining a default_search field for search:
<field name="default_search" type="text" indexed="true" stored="true" multiValued="true"/>
And then copy all fields into the default_search field.
<copyField source="*" dest="default_search" />
Finally, you can make a query for all fields like this:
http://host/core/select/?q=something
or
http://host/core/select/?q=default_search:something
AFAIK *:something does not query all the fields. It looks for a field names *.
I get the below error when attempting to do a query for *:test
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">9</int>
<lst name="params">
<str name="wt">xml</str>
<str name="q">*:test</str>
</lst>
</lst>
<lst name="error">
<str name="msg">undefined field *</str>
<int name="code">400</int>
</lst>
</response>
You would need to define a catchall field using copyField in your schema.xml.
I would recommend not using a simple wildcard for dynamic fields. Instead something like this:
<dynamicField name="*_text" type="text_general" indexed="true" stored="true" />
and then have a catchall field
<field name="CatchAll" type="text_general" indexed="true" stored="true" multiValued="false" />
You can have a copyField defined as below, to support query such as q=something
<copyField source="*_text" dest="CatchAll" />

solr spatial search with distance to search results

I'm able to return all results within a specific radius from geolocation point A, but I want to return the distance of each search result to point A.
I was reading this: http://wiki.apache.org/solr/SpatialSearch
I have this Solr query:
http://localhost:8983/solr/tt/select/?indent=on&facet=true&fq={!geofilt}&pt=51.4416420,5.4697225&sfield=geolocation&d=20&sort=geodist()%20asc&q=*:*&start=0&rows=10&fl=_dist_:geodist(),id,title,lat,lng,geolocation,location&facet.mincount=1
And this in my schema.xml
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
<field name="geolocation" type="location" indexed="true" stored="true"/>
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false"/>
This is one of the results:
<doc>
<str name="geolocation">51.4231086,5.474830699999984</str>
<str name="id">122</str>
<str name="lat">51.4231086</str>
<str name="lng">5.474830699999984</str>
<str name="title">Eindhoven Museum</str>
</doc>
However, with my current query string, I don't see a distance field in the document.
What am I missing?

Solr - Get the sum of all "filemetadata.filesize" field for a given user

I'm building some kind of file storage software.
The files metadata are indexed with fields like filesize and userId
The
What I'd like to do is to be able to compute the space used by an user.
For exemple if I have documents:
documentId = 1 | userId = 1 | fileSize = 10
documentId = 2 | userId = 2 | fileSize = 5
documentId = 3 | userId = 1 | fileSize = 3
I'd like to run a query so that for userId=1 I retrieve a result being 13MB (10+3)
I have seen that we can run FunctionQuery but it doesn't seem to do what I'm looking for.
Same for the FieldCollapsing which doesn't permit to run aggregation functions on the grouped results.
I have tested the StatsComponent as well but it doesn't seem to work for unknown reasons.
My schema contains:
<field name="FileSize" type="integer" indexed="false" stored="true" required="true" />
<field name="OtherField" type="sfloat" indexed="true" stored="true" required="false" />
<field name="OtherField2" type="integer" indexed="true" stored="true" required="false" multiValued="false"/>
<field name="OtherField3" type="integer" indexed="true" stored="true" required="false" multiValued="false"/>
And when I perform the query
http://mysolr:8414/solr/mycore/select/?q=docId:123
&rows=0
&stats=true
&stats.field=FileSize
&stats.field=OtherField
&stats.field=OtherField2
&stats.field=OtherField3
I get back the result:
<lst name="stats">
<lst name="stats_fields">
<null name="FileSize"/>
<lst name="OtherField">
<double name="min">6.0</double>
<double name="max">6.0</double>
<long name="count">1</long>
<long name="missing">0</long>
<double name="sum">6.0</double>
<double name="sumOfSquares">36.0</double>
<double name="mean">6.0</double>
<double name="stddev">0.0</double>
<lst name="facets"/>
</lst>
<lst name="OtherField2">
<double name="min">0.0</double>
<double name="max">0.0</double>
<long name="count">1</long>
<long name="missing">0</long>
<double name="sum">0.0</double>
<double name="sumOfSquares">0.0</double>
<double name="mean">0.0</double>
<double name="stddev">0.0</double>
<lst name="facets"/>
</lst>
<null name="OtherField3"/>
</lst>
</lst>
As you can see I'm asking for stats for a single doc (which isn't really useful but helps to debug, anyway without the q=docId:123 it doesn't return me a better result).
This document has a set FileSize of 15
I use Solr 4.1
Can someone please explain me why I can get stats for fields OtherField and OtherField2, but not for fields FileSize and OtherField3? I don't see the problem at all...
Good news, writing this question helped me find the solution. I use a legacy schema and didn't notice that the FileSize field had indexed="false".
Passing this attribute to true makes the StatsComponent returns stats for that field!
However, for the field OtherField3 which has exactly the same definition as OtherField2, I have no answer

Resources