Updatable Fields In Solr - solr

I am using Solr for searching my corpus of web page data. My solr-indexer will create several fields and corresponding values. However some of these fields I want to update more often, like for example the number of clicks on that page. These fields need not be indexable and I don't need to perform a search on these field values. However I do want to fetch them and update them often.
I am a newbie in solr so a more descriptive answer with perhaps some running example/code would help me better.

If you are on Solr 4+, yes you can push a Partial Update to Solr index.
For partial update, all fields in your schema.xml need to be stored.
This is how your fields section should look like:
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true" />
<field name="body" type="text_general" indexed="true" stored="true"/>
<field name="clicks" type="integer" indexed="true" stored="true" />
</fields>
Now when you send a partial update to one of the fields, eg: in your case the "clicks"; in the background Solr will go and fetch values for all other fields for that document, such as title, description, body, delete old document and will push new updated document to Solr index.
localhost:8080/solr/update?commit=true' -H 'Content-type:application/json' -d '[{"id":"1","clicks":{"set":100}}]
Here is a good documentation on partial updates: http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

Sample SOLR- partial update code:
Prerequisites: The fields need to be stored.
You need to configure update log path under direct update handler
<updateHandler class="solr.DirectUpdateHandler2">
<!-- Enables a transaction log, used for real-time get, durability, and
and solr cloud replica recovery. The log can grow as big as
uncommitted changes to the index, so use of a hard autoCommit
is recommended (see below).
"dir" - the target directory for transaction logs, defaults to the
solr data directory. -->
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
</updateHandler>
Code:
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer;
import org.apache.solr.client.solrj.impl.HttpSolrServer;
import org.apache.solr.common.SolrInputDocument;
public class PartialUpdate {
public static void main(String args[]) throws SolrServerException,
IOException {
SolrServer server = new HttpSolrServer("http://localhost:8080/solr");
SolrInputDocument doc = new SolrInputDocument();
Map<String, String> partialUpdate = new HashMap<String, String>();
// set - to set a field.
// add - to add to a multi-valued field.
// inc - to increment a field.
partialUpdate.put("set", "peter"); // value that need to be set
doc.addField("id", "122344545"); // unique id
doc.addField("fname", partialUpdate); // value of field fname corresponding to id 122344545 will be set to 'peter'
server.add(doc);
}
}

Related

Adding child documents to existing Solr 6.4 collection documents creates duplicate documents

This question is similar to Solr doesn't overwrite - duplicated uniqueKey entries, but I am in a situation where I have a large body of existing documents that have already been added to the collection with no child documents, and I am using (standalone not cloud) Solr 6.4 rather than 5.3.1. We recently enabled child documents so that we could store richer data.
We use SolrJ to load data into and query Solr, but to isolate the issue we're seeing, I used the command line Solr post tool to upload the following document:
<add>
<doc>
<field name="id">1</field>
<field name="solr_record_type">1</field>
<field name="title">Fabulous Book</field>
<field name="author">Angelo Author</field>
</doc>
</add>
Search results were as expected:
Using q=id:1 and
fl=id,title,index_date,[child parentFilter="solr_record_type:1"]
"response":{"numFound":1,"start":0,"docs":[
{
"id":"1",
"title":"Fabulous Book",
"index_date":"2019-01-16T23:06:57.221Z"}]
}
Then I updated the document by posting the following:
<add>
<doc>
<field name="id">1</field>
<field name="solr_record_type">1</field>
<field name="title">Fabulous Book</field>
<field name="author">Angelo Author</field>
<doc>
<field name="id">1-1</field>
<field name="solr_record_type">2</field>
<field name="contributor_name">Polly Math</field>
<field name="contributor_type">3</field>
</doc>
</doc>
</add>
Then, repeating my search, I got the following duplicate result, searching on the unique id field, which is undesirable.
"response":{"numFound":2,"start":0,"docs":[
{
"id":"1",
"title":"Fabulous Book",
"index_date":"2019-01-16T23:06:57.221Z",
"_childDocuments_":[
{
"id":"1-1",
"solr_record_type":2,
"contributor_name":"Polly Math",
"contributor_type":3,
"index_date":"2019-01-16T23:09:29.142Z"}]},
{
"id":"1",
"title":"Fabulous Book",
"index_date":"2019-01-16T23:09:29.142Z",
"_childDocuments_":[
{
"id":"1-1",
"solr_record_type":2,
"contributor_name":"Polly Math",
"contributor_type":3,
"index_date":"2019-01-16T23:09:29.142Z"}]}]
}
Going the other way, if I start with a document that was loaded initially with a child document, like the following:
<add>
<doc>
<field name="id">2</field>
<field name="solr_record_type">1</field>
<field name="title">Wonderful Book</field>
<field name="author">Andy Author</field>
<doc>
<field name="id">2-1</field>
<field name="solr_record_type">2</field>
<field name="contributor_name">Polly Math</field>
<field name="contributor_type">3</field>
</doc>
</doc>
</add>
And then I update it with a document with no children:
<add>
<doc>
<field name="id">2</field>
<field name="solr_record_type">1</field>
<field name="title">Wonderful Book</field>
<field name="author">Andy Author</field>
</doc>
</add>
The result still has the child:
"response":{"numFound":1,"start":0,"docs":[
{
"id":"2",
"title":"Wonderful Book",
"index_date":"2019-01-16T23:09:39.389Z",
"_childDocuments_":[
{
"id":"2-1",
"title_id":2,
"title_instance_id":2,
"solr_record_type":2,
"contributor_name":"Polly Math",
"contributor_type":3,
"index_date":"2019-01-16T23:07:04.861Z"}]}]
}
This is strange because if I update a document with 2 child documents with a replacement document with only 1 child document, it does drop one child document. But in this case, it is not dropping the child document.
Updates of documents with no child documents that don't add child documents, and updates of documents with child documents that don't remove all child documents both seem to work as I'd expect.
I have a large body of existing documents that don't have children, which I may be adding children to, and eventually I may have a lot of child-having documents that might drop their children. Given that, what is the best way to update these records without generating duplicate records or losing updates?
I would strongly advise avoiding Solr parent/child relationships. We decided to use them in Solr 5.3.1 and it turns out that although much of the functionality is there, there are a number of nasty bugs present in Solr since 4.x that remain unfixed including
SOLR-6096: Support Update and Delete on nested documents
SOLR-5211: updating parent as childless makes old children orphans (UPDATE: fixed in 8.0)
SOLR-6596: Atomic update and adding child doc not working together
SOLR-5772: duplicate documents between solr "block join" documents and "normal" document
SOLR-10030: SolrClient.getById() method in Solrj doesn't retrieve child documents
For those reasons, if at all possible, I strongly recommend AVOID using child documents. Even if those issues don't hit you now they will in the future at some point and it's clear, given that they have not been fixed in 3 to 4 major versions, that there is no real support in the product for child documents. Sorry to be the bearer of bad news but hopefully someone can learn from our experience.

Update (not replace) Solr data with solrj library [duplicate]

This question already has an answer here:
Update specific field in Solr
(1 answer)
Closed 6 years ago.
Suppose, I have a Solr index with current structure:
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="field_1" type="string" indexed="true" stored="true"/>
<field name="field_2" type="string" indexed="true" stored="true"/>
which already has some data. I want to replace data in field "field_1" but data in field "field_2" has to be stay untouched.
For a while I have been using curl whith json file for this task. The example of json file is
[
"{"id":1,"field_1":{"set":"some value"}}"
]
Data in this file replace value only in field "field_1".
Now I have to the same with solrj library.
There are some code snippets in order explain my attempts.
SolrInputDocument doc = new SolrInputDocument();
doc.addField("field_1", "some value");
documents.add(doc);
server = new ConcurrentUpdateSolrClient(solrServerUrl, solrQueueSize, solrThreadCount);
UpdateResponse resp = server.add(documents, solrCommitTimeOut);
When I run this code value of the "field_1" became "some value", but the value of "field_2" became is null.
How can avoid replacing value in field "field_2"?
Because you are doing a full update, what you are doing is overwriting the entire previous document with a new one, which does not have field2.
You need to do a partial update as explained here (scroll down to SOLRJ comment):
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
SolrJ code for Atomic Update
String solrBaseurl = "http://hostname:port/solr";
String collection = "mydocs";
SolrClient client = new HttpSolrClient(solrBaseurl);
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "test");
Map<String, String> cmd1 = new HashMap<>();
Map<String, String> cmd2 = new HashMap<>();
cmd1.put("set", "newvalue");
cmd2.put("add", "additionalvalue");
doc.addField("field1", cmd1);
doc.addField("field2", cmd2);
client.add(collection, doc);

Geoloc is incorrect after update parts document in Solr

I want to update a particular field of a document in Solr. But after updating , the field *_coordinate was converted from tdouble to array. How can I fix it ? I use Apache Solr version 6.2.1.
This is my dynamicField in the schema file:
<!-- Type used to index the lat and lon components for the "location" FieldType -->
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="true" useDocValuesAsStored="false" />
This is the code that I used for update a field:
String solrID = (String) currentDoc.getFieldValue("id");
SolrInputDocument solrDocToIndex = new SolrInputDocument();
solrDocToIndex.addField("id", solrID);
Map<String, String> partialUpdate = new HashMap<>();
partialUpdate.put("add", "Solr Demo");
solrDocToIndex.addField("tags", partialUpdate);
I have a field geoloc_0_coordinate. Before do update it has value 12.123456. But after running the code update, it changed to [12.123456,12.123456]

Sitecore _path field returns NULL in Solr index

I am using Solr index for Sitecore.
However, the search result always gives back null for _path field.
It was working on Lucene. Does Solr needs special treatment?
Below is the glass mapper property:
[IndexField("_path"), TypeConverter(typeof(IndexFieldEnumerableConverter))]
[SitecoreIgnore]
public virtual System.Collections.Generic.IEnumerable<ID> EntityPath { get; set; }
And the SOLR schema has entry below:
<field name="_path" type="string" indexed="true" stored="false" multiValued="true" />
Change your "store" setting to true:
<field name="_path" type="string" indexed="true" stored="true" multiValued="true" />
The stored attribute will make sure that your original value is kept in the index for retrieval. Otherwise you can search in the field, but not fetch it.

Is it possible to have a static index field for Liferay using solr-web plugin?

Can anyone tell me if I can associate a static index field for Liferay using the solr-web.plugin? Is there a way to define a static index in solr?
I need something similar to the following configuration in Nutch
<property>
<name>index.static</name>
<value>source:nutch</value>
</property>
This will add the field "source" as an index and its value as "nutch" to all documents in Nutch. Anything similar to this for Liferay + Solr?
Not sure for Liferay configuration, however you can add a default value in the schema.xml which will be applied to documents.
<field name="source" type="string" indexed="true" stored="true" default="Nutch" />

Resources