solr fq; integer comparison on a substring - solr

That is probably a bad title...
But let's say I have a bunch of strings in a multivalue field
<field name="stringweights" type="text_nostem" indexed="true" stored="true" multiValued="true"/>
Sample data might be:
history:10
geography:33
math:29
Now I want to write a fq where I select all records in solr where:
stringweights starts with "geography:"
and where the integer value after "geography:" is >= 10.
Is it possible to write a solr query like that?
(It's not possible to create an integer field in the solr schema named "geography", another called "math" etc because these string portions of the field are unknown at design time and can be many hundreds / thousands of different values.)

You may want to look into dynamic fields. Declare a dynamic field in your schema like:
<dynamicField name="stringweight_*" type="integer" indexed="true" stored="true"/>
Then you can have your docs like:
stringweight_history: 10
stringweight_geography: 33
stringweight_math: 29
Your filter query is then simply:
fq=stringweight_geography:[10 TO *]
You may need to build a custom indexer for doing this. Or use a script transformer with data import handler as mentioned here: Dynamic column names using DIH (DataImportHandler).

Related

Could Solr search contains wildcard in key?

I have a json block saved as one document in solr,
{
"internal":...
"internet":...
"interface":...
"noise":...
"noise":...
}
Could I seach as " inter*:* "? I want to find out all content with key start with "inter"
Unfortunately, I got parser error, is there any way that I could the search with a wildcard in the key?
No, not really. You'll have to do that as a copyField if providing a wildcard is important to you, in effect copying everything into a single field and then querying that field.
You can supply multiple fields through qf without specifying each field in the q parameter as long as you're using the edismax query handler - that's usually more flexible, but it will still require each field to be specified.
There's also a little known feature named "Field aliasing using per-field qf overrides" (I'm wasn't aware with it, at least). If I've parsed what I've been able to find from a few web searches correctly, you should be able to do f.i_fields.qf=internal internet interface&qf=i_fields. In effect creating an i_fields alias that refers to those three fields. You'll still have to give them explicitly.
You can use Dynamic fields. It allow Solr to index fields that you did not explicitly define in your schema.
This is useful if you discover you have forgotten to define one or more fields. Dynamic fields can make your application less brittle by providing some flexibility in the documents you can add to Solr.
A dynamic field can be defined like
<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
Please refer solr documentation for more on Dynamic Fields.
Dynamic Fields
After this create a copy field. Copy the dynamic fields into the copy field.
Once done with this, query can be done on the copyField.
<dynamicField name="inter_*" type="string" indexed="true" stored="true"/>
<field name="internal_static" type="string" indexed="true" stored="true" multiValued="true"/>
<copyField source="inter_*" dest="emp_static"/>

Copy-transform a numeric field in Solr?

I have a dynamic numeric multivalued field in my solr schema -
<dynamicField name="*_nm" type="float" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
I'd like to run a function score on said field -
_val_:"if(exists(features.width_nm),mul(exp(div(pow(max(0,sub(abs(sub(features.width_nm,12.00000)),0.00000)),2),-51.93702)),10.00000),0.000000)"
but function queries on multivalued fields aren't properly supported in my version of Solr (5.2.1). Trying the above gives the error -
"can not use FieldCache on multivalued field"
My current work-around for this is during indexing to create another field, numeric single-valued, which contains a "reduced" form of the multivalues.
Currently I do this in Java code.
Is there any way for me to do this directly in Solr? for example using a "copy-field"?
Just for completeness - In solr 6.3 I am able to calculate a function-score on a multivalued field by using the field function with a min/max parameter described here.
Thank you very much!

How to add custom prefix and suffix string in a field in solr

Like the title said I tried many ways to add a prefix and/or a suffix in a field in solr. More precisely I mean:
For example I have those fields in my schema.xml
field name="field1" type="double" indexed="true" stored="true" multiValued="false"
field name="field2" type="double" indexed="true" stored="true" multiValued="false"
I would like to have my third field like this
"field3" = "{a prefix}field1 field2{a suffix}
The problem is, I saw many ways to copy field or concat other store fields
but I don't want my prefix or my suffix be stored in solr, I just want to have a field3 with two other fields and custom strings that I choose to put in. I ask this question here because after a lot of search I did not find anything good for my purpose.
You can use an StatelessScriptUpdateProcessor to manipulate the contents of the field on the way in, and merge it with the values from field1 and field2.
As you can write regular Javascript in the script referenced from the processor, you can add the prefix and suffix in any way you want, without having to supply it in the request.

Is it possible to get SOLR DIH to ignore spatial fields for documents with invalid lat/long values?

Im trying to import data from an Oracle Database to SOLR index. Dabatase entities do have lat/long values and the documents in the index should have a field position. The corresponding configuration in the data-config.xml hence is
<field column="LONGITUDE" name="long_d" />
<field column="LAT" name="lat_d" />
<field column="bl" name="position" template="${data.LAT},${data.LONGITUDE}"/>
where position field is defined as
<field name="position" type="location_rpt" indexed="true" stored="true" multiValued="false"/>
in the schema.xml file.
The problem I've is caused by badly choosen default values 999.9 for database entries for both lat and long which are not accepted by the DIH as import values for the position field.
So my intention is to simply omit the field position whenever the DB entry has erroneous default values.
Is there something I can define in the configuration file for the DataImportHandler that will give me my desired results?
There are two stages where you can apply changes:
You can use a transformer inside DIH itself
You can use a custom update request processor (URP) chain to replace or get rid of the fields
So, for example, you could use RegexTransformer to replace known bad values with blanks. If that (blank but present fields) causes problems, you could use RemoveBlankFields in a custom chain to drop them.

SOLR - Use single text field in schema for full text search

I am getting familiar with SOLR.
I would like to use SOLR for full text search for many kind of entities. I don't want to create a Document for every different type of entity. I don't want to be able to search for specific fields. I am only interested in that if a specified string is anywhere in any item.
In database terms for example I have a table News and a table Employee and I want to search for the word 'apple', I don't mind in which field it is, I only want to get back the database ID from the records which contain it.
Could it be a solution, that I use a SOLR schema something like this:
<fields>
<field name="id" type="string" indexed="true" stored="true"/>
<field name="content" type="text" indexed="true" stored="false"/>
</fields>
So, I only need an ID and the contents. I put all the data, in which I want to be able search into one 'content' field. When I search for some words it looks for it in the 'id' and int the 'content'.
Is this a good idea? Any performance or design problem?
Thanks,
Tamas
See https://wiki.apache.org/solr/SchemaXml#Copy_Fields. It says:
A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-
<copyField source="*" dest="text"/>
That's typically what is done to search across multiple fields.
But if you don't even want your original fields, just concatenate all your fields into one big field content and index in Solr. There should be no problems with that.
You can either copyField to text (see example in the distribution) and have that set as default field ("df" parameter in solrconfig.xml for the select handler).
Or, if you anticipate more complex requirements down the line and/or non-text searches, I would recommend looking at eDismax with qf parameter and it will handle searching all those fields itself.

Resources