Is there a way to increment a numeric field in solr that is indexed but not stored?
I.e. I have
<add>
<doc>
<field name="n">10</field>
</doc>
</add>
And the schema is something like:
<field name="n" indexed="true" stored="false" type="tint" />
And I want to do an update on n where for example i increment the current value by some value m.
The only thing I can think of is to make the value both stored and indexed, and then when I want to update the value I have to query solr to get the existing value then call the update endpoint to write out the new value. Or is there an easier way?
SOLR-139 was recently committed that will allow adding etc, but:
fields must be stored
Fix Version is not set but I guess is only on trunk
Related
I have this Solr field
<field name="listing_thumbnail" type="string" indexed="false" stored="true"/>
Now when the results are shown the fields without the field value should be shown at the last. Is this possible in SOLR? To generalise is it possible to sort documents on field completeness?
You can make use of bq (Boost Query) Parameter of the dismax/edismax query handler. This allows to query if a field is empty or not and then affect the score, but to do so the field needs to be indexed=true.
If you had your field indexed you could add bq=(listing_thumbnail:*) - this would give a push to all documents with a value in that field.
Im trying to import data from an Oracle Database to SOLR index. Dabatase entities do have lat/long values and the documents in the index should have a field position. The corresponding configuration in the data-config.xml hence is
<field column="LONGITUDE" name="long_d" />
<field column="LAT" name="lat_d" />
<field column="bl" name="position" template="${data.LAT},${data.LONGITUDE}"/>
where position field is defined as
<field name="position" type="location_rpt" indexed="true" stored="true" multiValued="false"/>
in the schema.xml file.
The problem I've is caused by badly choosen default values 999.9 for database entries for both lat and long which are not accepted by the DIH as import values for the position field.
So my intention is to simply omit the field position whenever the DB entry has erroneous default values.
Is there something I can define in the configuration file for the DataImportHandler that will give me my desired results?
There are two stages where you can apply changes:
You can use a transformer inside DIH itself
You can use a custom update request processor (URP) chain to replace or get rid of the fields
So, for example, you could use RegexTransformer to replace known bad values with blanks. If that (blank but present fields) causes problems, you could use RemoveBlankFields in a custom chain to drop them.
Hi below is data xml which I have inserted in solr.
<add>
<doc>
<field name="id">3007</field>
<field name="name">Autauga</field>
<field name="coord">POLYGON((-10 30,-40 40,-10 -20,40 20,0 0,-10 30))</field>
</doc>
</add>
There will be many documents of such type denoting separate regions
Now please let me know How can I search that document having a given point which lies in the range of polygon.
Your Solr Version must be 4 or higher and you have to import the JTS jar-file. You also have to define a field with a fieldType of "solr.SpatialRecursivePrefixTreeFieldType". Then you can query using a filter query like fq=geo:"Intersects(10.12 50.02)".
But please see my previous post or http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 for more detailed information.
after searching and searching over the net, i've found a possible open-source solution for the click-count-popularity in solr (=does not require a payd version of lucid work search).
In my next two answers i will try to solve the problem in a easy way and in a way a little bit complex...
But first some pre-requisites.
We suppose to google-like scenario:
1. the user will introduce some terms in a textfield and push the search button
2. the system (a custom web-app coupled with solr) will produce a web page with results that are clickable
3. the user will select one of the results (e.g. to access to the details) and will inform the system to change the 'popularity' of the selected result
The very easy way.
We define a field called 'popularity' in solr schema.xml
<field name="popularity" type="long" indexed="true" stored="true"/>
We suppose the user will click on the document with id 1234, so we (=the webapp) have to call solr to update the popularity field of the document with id 1234 using the url
http://mysolrappserver/solr/update?commit=true
and posting in the body
<add>
<doc>
<field name="id">**1234**</field>
<field name="popularity" update="inc">1</field>
</doc>
</add>
So, each time the webapp will query something to solr (combining/ordering the solr 'boost' field with our custom 'popularity' field) we will obtain a list ordered also by popularity
The more complex idea is to update the solr index tracing not only the user selection but also the search terms used to obtain the list.
First of all we have to define a history field where to store the search terms used:
<field name="searchHistory" type="text_general" stored="true" indexed="true" multiValued="true"/>
Then we suppose the user searched 'something' and selected from the result list the document with id 1234. The webapp will call the solr instance at the url
http://mysolrappserver/solr/update?commit=true
adding a new value to the field searchHistory
<add>
<doc>
<field name="id">**1234**</field>
<field name="searchHistory" update="add">**something**</field>
</doc>
</add>
finally, using the solr termfreq function in every following query we will obtain a 'score' that combined with 'boost' field can produce a sorted list based of click-count-popularity (and the history of search terms).
This is interesting approach however I see some disadvantages in it:
Overall items storage will grow dramatically with each and every search.
You're assuming that choosing specific item is 100% correct and it wasn't done by mistake or for brief only. In this way you might get wrong search results along the way.
I suggest only to increment the counter or even to maintain relative counter based on the other results that the user didn't click it.
I have a SOLR schema as following:
<field name="category_id1" type="integer" indexed="false" stored="true" />
<field name="category_id2" type="integer" indexed="false" stored="true" />
<field name="category_id3" type="integer" indexed="false" stored="true" />
<field name="category_ids" type="integer" multiValued="true" indexed="true" stored="true"/>
and a copy section:
<copyField source="category_id1" dest="category_ids" />
but whenever I tried to inject the data into DSE/Cassandra, I got this error
InvalidRequestException(why:(Expected 4 or 0 byte int (14)) [diem][business][category_ids] failed validation)
me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:(Expected 4 or 0 byte int (14)) [diem][business][category_ids] failed validation)
Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:(Expected 4 or 0 byte int (14)) [diem][business][category_ids] failed validation)
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:264)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at com.diem.db.crud.CassandraStorageManager.insertMultiColumns(CassandraStorageManager.java:197)
at com.diem.db.dao.impl.AbstractDaoImpl.saveUUIDEntity(AbstractDaoImpl.java:47)
at com.diem.db.dao.impl.BusinessDaoImpl.saveBusiness(BusinessDaoImpl.java:81)
at com.diem.data.LoadBusinesses.execute(LoadBusinesses.java:187)
at com.diem.data.LoadContent.run(LoadContent.java:121)
at com.diem.data.LoadBusinesses.main(LoadBusinesses.java:45)
Caused by: InvalidRequestException(why:(Expected 4 or 0 byte int (14)) [diem][business][category_ids] failed validation)
at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20833)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:258)
... 8 more
A copy into a multiValued solr.IntField (integer) isn't something special and we could do it before using DSE/SOLR. But I can't seem to get this work inside DSE/SOLR combination. Logically speaking, I can't see any reason why this fails, because DSE should not interfere with operation on category_ids field, which is used primarily for indexing. Does anyone see anything wrong with the situation? What could I do in this situation to prevent the validation error (note: I can't use a text/string type for category_ids)?
Thank you!
I could find out the problem, my CF has a default_validation_class=BytesType, so the multiValued field category_ids is validated using BytesType in DSE/Solr, which will cause the error. So unless I change my CF into CQL declaration using the type of LIST<int> and do not use Hector (at least for this CF), I won't be able to work with multiValued fields other than text/string fields in Solr.
If I understand it correctly, you are using thrift tables, so you either declare the category_ids column as UTF8Type (the Solr field can be of any type), or you declare the category_ids Solr field as stored=false (in which case the copy field will not be stored, only indexed).
Let us know if any of the two works for you.