Apache solr date field in views - solr

I have a custom date field in one of my content type field_last_archived_date.
There is a corresponding entry in the Apache solr field list called dm_field_last_archived_date.
Now there are two problems that I am facing
When I try to use this field in a solr view to sort the same, it gives me error "cannot sort on multivalued field."
When I try to use this field as an exposed filter to provide a date range, I'm not sure what date format should be given. I have tried formats like "2011-10-01T23:59:59Z", "2011-10-01 23:59:59", plain unix timestamp, etc. But all of them throws error "Invalid Date String:'OctoberAMCECESTAM+02:001_SunAMCESTE_1nd+02008601'".
Any idea what I am doing wrong here?
Thanks...

dm_field_last_archived_date field is multi value field and solr is not provide sorting on multi value field.
To confirm behavior apply sort on single value field.
You can check multi value in schema file in solr it looks like
<field name="yourFieldName" type="tint" indexed="true" stored="true" omitNorms="true" multiValued="true" default="defaultValue"/>

Related

Solrj indexing mechanism

I have a question about indexing mechanism using Solr in Java. If I create a documents and i want to find only field "name", solr will be index all fields? Or only index by field "name" in each document?
If you tell Solr to only store the field name in your schema, then only the field name will be stored.
If you instruct Solr to store everything you send to it (like in the schemaless mode) and you send 400 fields, each of those fields will be stored.
If you want to store information but not search for it, only those fields which you are going to query need to be indexed, while the other fields can be limited to just stored. If you don't need the content of the field, but just want to search for it, you can set stored to false, and indexed to true.
In the schema.xml where you define the fields getting used, you need to mention indexed=true for all the fields you want to search on.
In your case it would look something like this -
<field name="name" type="string" indexed="true" stored="true" />

Copy-transform a numeric field in Solr?

I have a dynamic numeric multivalued field in my solr schema -
<dynamicField name="*_nm" type="float" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
I'd like to run a function score on said field -
_val_:"if(exists(features.width_nm),mul(exp(div(pow(max(0,sub(abs(sub(features.width_nm,12.00000)),0.00000)),2),-51.93702)),10.00000),0.000000)"
but function queries on multivalued fields aren't properly supported in my version of Solr (5.2.1). Trying the above gives the error -
"can not use FieldCache on multivalued field"
My current work-around for this is during indexing to create another field, numeric single-valued, which contains a "reduced" form of the multivalues.
Currently I do this in Java code.
Is there any way for me to do this directly in Solr? for example using a "copy-field"?
Just for completeness - In solr 6.3 I am able to calculate a function-score on a multivalued field by using the field function with a min/max parameter described here.
Thank you very much!

Solr not including fields having empty value in result

Indexed data on Solr contains some fields which are having empty values. When I run q=*:* it does not include fields having empty values. What parameter do I need to pass while query to get fields having empty values in the result.
EDIT :
I am indexing data using a csv file, entries in file are as follows :
id, dob, name
1,,name1
2,,name2
Now when I search for top 10 records I get only two fields. I want to get all fields even if there is no value stored for that.
Field should have stored="true"
Cross check in your schema.xml file about dob field. it should have stored="true"
<field name="dob" type="text_general" indexed="true" stored="true"/>
reindex the documents and query again, it works.
Hope this help
If an item doesn't have a piece of data, solr doesn't store the field. You should be able to force storage of an empty string by setting the field attributes required="True" default="".
what do you mean empty fields ? are these fields are set as indexed=true ? Are you setting empty say spaces as data when you are indexing these fields ? Looks like you are not sending even a template blank data to this variable , that is why its happening . For example if i send data in this format {"id":"change.me","title":" "} , where my title field is empty it gets indexed .But if you try to send a data like this , {"id":"change.me","title":} , it will send an error across solr .
Using a query add wt=csv and you can export a well formed CSV file
Specify the fields you require back using fl=
Example:
select?fl=id,foo,bar&indent=on&q=field:value&stored=true&rows=1000&start=0&wt=json

Sort on field completeness of Solr Documents

I have this Solr field
<field name="listing_thumbnail" type="string" indexed="false" stored="true"/>
Now when the results are shown the fields without the field value should be shown at the last. Is this possible in SOLR? To generalise is it possible to sort documents on field completeness?
You can make use of bq (Boost Query) Parameter of the dismax/edismax query handler. This allows to query if a field is empty or not and then affect the score, but to do so the field needs to be indexed=true.
If you had your field indexed you could add bq=(listing_thumbnail:*) - this would give a push to all documents with a value in that field.

Getting date metadata using SolrCell

I'm using Solr 3.6 to index many different types of documents. I have several fields that define common information for all the documents, one of them being 'date' (ideally last modified date, just something to indicate how recent a document is.)
<field name="date" type="date" indexed="true" stored="true" required="true" />
My problem arises when trying to index rich text documents like .docx and .pdf. I want to fill in the date field using metadata that I get from the ExtractingRequestHandler, but the name of the field that the date information I want is stored in is different for each file. Sometimes the field I want is 'date', othertimes it's 'last_modified' or 'last_save_date'. I was trying to use 'last_modified' to provide the date in the handler:
<str name="fmap.last_modified">date</str>
..but this led to problems where date was either multivalued (since there was 'date' metadata) or undefined (because 'last_modified' didn't exist). I looked into using conditional copyFields to try to extract data from at least one of these fields, but that seems complicated (i.e. extending the update handler) and would also require that I know the name of every possible field that could contain this date information.
Is there any way that I can reliably extract a date from every rich-text document that I process?

Resources