Add user-specified internal version to Solr core? - solr

I have a script that loads information about medications, like you would find in RxNorm, into a Solr core. There's a relatively constant schema for all of the documents. See below.
I would also like to add a document to the core with two properties:
the date on which the core was populated
the version of the software that did the population
Are there established ways to do that? I'm using R's solrium package.
Could this be considered a bad idea? Is there some way to lock the core so changes can't be made after the version document is added? I do have a customized schema.xml, but otherwise this is a pretty vanilla Solr setup.
Schema illustration
select?q=medlabel%3Aacetaminophen
gets
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"medlabel:acetaminophen"}},
"response":{"numFound":4269,"start":0,"docs":[
{
"id":"http://purl.bioontology.org/ontology/RXNORM/161",
"medlabel":["acetaminophen"],
"tokens":["acetaminophen"],
"definedin":["http://purl.bioontology.org/ontology/RXNORM/"],
"employment":["IN"],
"_version_":1674388636888465414},
{
"id":"http://purl.obolibrary.org/obo/CHEBI_46195",
"medlabel":["acetaminophen"],
"tokens":["4-acetamidophenol",
"acetaminophen",
"apap",
"panadol",
"paracetamol",
"tylenol"],
"definedin":["http://purl.obolibrary.org/obo/chebi.owl"],
"employment":["active_ingredient"],
"_version_":1674388639675580445},
{
"id":"http://purl.bioontology.org/ontology/RXNORM/1006970",
"medlabel":["acetaminophen / dimenhydrinate"],
"tokens":["/",
"acetaminophen",
"dimenhydrinate"],
"definedin":["http://purl.bioontology.org/ontology/RXNORM/"],
"employment":["MIN"],
"_version_":1674388635062894610}
etc.

You can set a collection in read only mode after indexing your content into it using MODIFYCOLLECTION. That will effectively give you a read-only collection which does not allow any updates.
My recommendation for your other case would be to have that field present on each document instead of as a separate document (which sure, that'd work as well). But if your number of documents is very large, add a separate document with the metadata you need.
However, you can also use MODIFYCOLLECTION for this to attach properties to the collection itself:
The attributes that can be modified are:
other custom properties that use a property. prefix
So you can add property.client_version and property.populated_datetime properties to the collection itself, which would then be replicated properly across your cluster if needed. The collection also have a last index update time available, but this might be node specific (since the commits can happen in different timeframes on each node). It won't let you attach the client version anyhow.

Related

How to transform/update and store incoming JSON property by applying PatternReplaceFilterFactory in SOLR 7.1.0

I am pretty new to SOLR and we have a requirement where I have to modify one of the JSON property value from incoming request to get updated and stored. Something like the below one.
e.g.
{
"name":"Google,CA,94043"
}
When I add this JSON via add/update documents using SOLR admin. I want this name to be stored as just Google. So when I do a search(query) . from SOLR admin it should list name as Google not "Google,CA,94043"
I have added FieldType with PatternReplaceFilterFactory and referenced the same to name field. The result is not appearing with the updated one. But when I analyze field value (index/query) using the admin tool it has the values correctly. Not sure how to achieve this.
Let me know if anyone has steps on how to achieve this.

How to move all documents from one collection to a new one with no routing key specified (6.3.0)

Have a collection c1 that has 400,000 documents and c2 with no documents (new collection).
No routing key is specified in either collection.
Trying to use the migrate Collections API endpoint found/described
in the documentation.
Since no routing key is specified I'm not sure what to use for the split.key parameter. I have found this thread that mentions that a split.key=! should encompass all documents. This has not proven true in my tests.
Here is my attempted url: http://solr.node:8983/solr/admin/collections?action=MIGRATE&collection=c1&split.key=!&target.collection=c2&async=1
This has not worked, and I have tried many iterations of the split.key parameter with no avail. Have tried blank, a!, id!, id, compositeId and none of them have migrated any documents to the c2 collection.
How does the Migrate function work with no specified routing key? Is there a default value to use to grab all documents that will actually work?
Thank you!
Edit: since no routing key was specified the current router is compositeId as solr does that when it is not specified.
I have the same question with you. I would like to get the answer too.
I only get some answer from SOLR guide.
split.key
The routing key prefix. For example, if the uniqueKey of a document is "a!123", then you would use split.key=a!. This parameter is required.
reference: https://solr.apache.org/guide/8_6/collection-management.html

SuiteCommerce Advanced - Show a custom record on the PDP

I am looking to create a feature whereby a User can download any available documents related to the item from a tab on the PDP.
So far I have created a custom record called Documentation (customrecord_documentation) containing the following fields:
Related item : custrecord_documentation_related_item
Type : custrecord_documentation_type
Document : custrecord_documentation_document
Description : custrecord_documentation_description
Related Item ID : custrecord_documentation_related_item_id
The functionality works fine on the backend of NetSuite where I can assign documents to an Inventory item. The stumbling block is trying to fetch the data to the front end of the SCA webstore.
Any help on the above would be much appreciated.
I've come at this a number of ways.
One way is to create a Suitelet that returns JSON of the document names and urls. The urls can be the real Netsuite urls or they can be the urls of your suitelet where you set up the suitelet to return the doc when accessed with action=doc&id=_docid_ query params.
Add a target <div id="relatedDocs"></div> to the item_details.tpl
In your ItemDetailsView's init_Plugins add
$.getJSON('app/site/hosting/scriptlet.nl...?action=availabledoc').
then(function(data){
var asHtml = format(data); //however you like
$("#relatedDocs").html(asHtml);
});
You can also go the whole module route. If you created a third party module DocsView then you would add DocsView as a child view to ItemDetailsView.
That's a little more involved so try the option above first to see if it fits your needs. The nice thing is you can just about ignore Backbone with this approach. You can make this a little more portable by using a service.ss instead of the suitelet. You can create your own ssp app for the function so you don't have to deal with SCAs url structure.
It's been a while, but you should be able to access the JSON data from within the related Backbone View class. From there, within the return context, output the value you're wanting to the PDP. Hopefully you're extending the original class and not overwriting / altering the core code :P.
The model associated with the PDP should hold all the JSON data you're looking for. Model.get('...') sort of syntax.
I'd recommend against Suitelets for this, as that's extra execution time, and is a bit slower.
I'm sure you know, but you need to set the documents to be available as public as well.
Hope this helps, thanks.

Sitecore 7.2 and SOLR: exclude clones from web index

I'm trying to exclude all clones from Sitecore's web index. I've created a custom crawler inheriting from Sitecore.ContentSearch.SitecoreItemCrawler overriding the IsExcludedFromIndex method with the following code:
protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation)
{
if (indexable.Item["Hide from Search"] == "1")
return true;
if (indexable.Item.IsClone)
return true;
return base.IsExcludedFromIndex(indexable, checkLocation);
}
My "Hide from Search" field works: any items with that field set are not included in the web index. However, the indexable.Item.IsClone is never true, and all "clones" remain in the web index.
When I run the master index against this crawler, the IsClone is true for each clone and they are not included in the index. I suspect it works for master and not for the web index because clones are expanded on publishing targets (as noted by John West).
Apologies if this question is considered a duplicate of Globally exclude cloned items from index? - the solution there did not work for me, and I'm using SOLR (vs Lucene) and on a newer version of Sitecore, so I believe this may be a separate issue.
So, how can I exclude all clones from a SOLR index of a Sitecore 7.2 web (publish target) database?
As you wrote in your question, IsClone property is not relevant for items which are published, cause Sitecore clears the value of __Source field.
That's why there is no out of the box method to determine whether the item from the web database was a clone or not.
What you can use is the solution proposed by John West in his blog post Identify Cloned Items Sitecore ASPNET CMS Publishing Target Databases. In nutshell, you need to add your processor to the publishing pipeline, and save the value of the __Source field in another custom field or at least store boolean value in your custom Is Cloned field.
Then you can use your approach, just instead of checking IsClone you need to check whether new custom field is not empty.

Change type of Datastore field to Text from String?

I can't seem to do this. The list that the app engine datastore viewer does not contain Text as an option. I had to change my fields because some of my values were too long for String, but now I can't retroactively fix my old entries.
To change the property type used by the old entities, you need to manually update each of them.
This can be easily and efficiently accomplished using the mapper API. This guide explains how to use this API.
You may also want to read this blog post by Nick Johnson.
You don't have to fix your old entries. The old ones should work as is, and the new ones just won't be indexed.
See
http://groups.google.com/group/google-appengine/browse_thread/thread/282dc825f9c46684 .

Resources