Is that possible to specify the copy field source as different collection field in SOLR? - solr

I am having an issue with the partial update in SOLR. As I am having some non-stored fields in my collection the values in the non stored fields gone after the partial update. So, is that possible to use copy field to copy the original content for the non stored field from a different collection?

No. copyFields are invoked when a document is submitted for indexing, so I'm not sure how that would semantically work either. In practice what a copyField instruction does is to duplicate the field value when the document arrives to the server and copy it into fields with other names. That assumption won't make sense if there's a different collection involved - does it get invoked when documents are submitted for the other collection? (if that's the case - what with the other fields local to the actual collection).
Set the fields to stored if you want to use partial updates with fields that can't support in place updates (which have very peculiar requirements, such as being non-stored, non-indexed, single valued and has numeric docValues).

Related

Reloading External file field with server up

I am trying to implement an external file field in order to change ranking values in Solr.
I've defined a field and field type in the schema and, in the "solrconfig.xml", bellow the <query> tags, created the external file and added the reload listeners as described in the ref guide:
After server start up, I'm able to sort the documents based on that previous created field, however, when i change the values while the server is up and when I make a new search query, I'm not able to see the updated rank list (neither the updated rank scores).
I also tried adding a reload request handler as suggested in another post and tried a force commit (http://HOST:PORT/solr/update?commit=true), but it says:
DirectUpdateHandler2 No uncommitted changes. Skipping IW.commit.
DirectUpdateHandler2 end_commit_flush
Any suggestions?
Using ExternalFileFields for scoring is really not that useful any more, since Solr and Lucene now supports In-place updates for values that uses docValues.
You can then use those fields directly from your document for scoring, and you can update them without having to update the whole document. That way you don't have to reload anything externally, and your caches can be managed automagically by Solr.
There are three conditions a field has to pass for in-place updates (that being said, atomic updates can also be used, but that requires all your fields to be set as stored):
An atomic update operation is performed using this approach only when
the fields to be updated meet these three conditions:
are non-indexed (indexed="false"), non-stored (stored="false"), single
valued (multiValued="false") numeric docValues (docValues="true")
fields;
the _version_ field is also a non-indexed, non-stored single valued
docValues field; and,
copy targets of updated fields, if any, are also non-indexed,
non-stored single valued numeric docValues fields.

Solr document disappears when I update it

I am trying to update existing documents in a (Sentry-secured) Solr collection. The updates are accepted by Solr, but when I query, the document seems to have disappeared from the collection.
What is going on?
I am using Cloudera (CDH) 5.8.3, and Sentry with document-level access control enabled.
When using document-level access control, Sentry uses a field (whose name is defined in solrconfig.secure.xml, but the default is sentry_auth) to determine which roles can see that document.
If you update a document, but forget to supply a sentry_auth field, then the updated document doesn't belong to any roles, so nobody can see it - it becomes essentially invisible! This is easily done, because the sentry_auth field is typically not a stored field, so won't be returned by any queries.
You therefore cannot just retrieve a document, modify a field, then update the document - you need to know which roles that document belongs to, so you can supply a properly-populated sentry-auth field.
You can make the sentry_auth field a "required" field, in the Solr schema, which will prevent you from accidentally omitting it.
However, this won't prevent you from supplying a blank sentry-auth field (or supplying incorrect roles), either of which will also make the document "disappear".
Also note that you can update a document that you do not have document-level access to, provided you have write-access to the collection as a whole, and you have the ID of the document. This means that users can (deliberately or accidentally) over-write or delete documents that they cannot see. This is a design choice, made so that users cannot find out whether a particular document ID exists, when they do not have document-level access to it.
See the Cloudera documentation:
http://blog.cloudera.com/blog/2014/07/new-in-cdh-5-1-document-level-security-for-cloudera-search/
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/search_sentry_doc_level.html
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_sentry.html

Documents lost contents after updating it

I posted 3 documents from post.jar and they successfully posted and i also searched any word of those documents so it returns correct document but when i partial update the document means just update one field then then after updating i once again searched for a word but it doesn't reply successfully.means after partial update it lost the contents of the documents. the fields which i updated are defined by me manually means out of those fields which build itself by post.jar.
so what is the solution that after partial update it remains same
Assuming by "partial update" you are talking about the Atomic Update feature, then this will apply:
In order for Atomic Update to not lose data, all fields in your schema that are not copyField destinations must have stored="true". All fields that ARE copyField destinations must have stored="false".
Further details required for proper Atomic Update operation: The information in copyField destinations must only originate from copyField sources. If some information in copyField destinations originates from the indexing source and some of it comes from copyField, then the information that originated from indexing will be lost when Atomic Update is used.
Also see the "Field Storage" section found on this page from the Solr documentation:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents#UpdatingPartsofDocuments-AtomicUpdates
I solved by problem by making stored=false to all dynamic fields and remove copy field of text
As all fields are copied in to text field so after doing these changes my problem becomes solve.

Does copy field requires data re-upload

I have solr instance having lacks of data uploaded. I want to create a new copyfield which is concatenation of existing two fields.
Do I need to repopulate my data?
Yes. From the solr documentation
Fields are copied before analysis is done
By analysis in copyfield context they mean index analizer, which executed when a document is indexed.

solr indexing and reindexing

I have the schema with 10 fields. One of the fields is text(content of a file) , rest all the fields are custom metadata. Document doesn't chnages but the metadata changes frequently .
Is there any way to skip the Document(text) while re-indexing. Can I only index only custom metadata? If I skip the Document(text) in re-indexing , does it update the index file by removing the text field from the Index document?
To my knowledge there's no way to selectively update specific fields. An update operation performs a complete replace of all document data. Since Solr is open source, it's possible that you could produce your own component for this if really desired.

Resources